Explainers
This section provides detailed documentation for all explainer classes in MLExplainer.
SHAP Explainers
Base Explainer
Base class for Machine Learning Explainers. This class is designed to be subclassed for specific machine learning models.
- class mlexplainer.core.base_explainer.BaseMLExplainer(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]
Bases:
ABCBase class for Machine Learning Explainers.
This class provides a structure for interpreting features in machine learning models and analyzing the correctness of the analysis for every feature.
- x_train
Training feature values.
- Type:
DataFrame
- y_train
Training target values.
- Type:
Series
- model
The machine learning model to explain.
- Type:
Callable
- __init__(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]
Initialize the BaseMLExplainer with training data, features, and model. This class is designed to be subclassed for specific machine learning models and should implement the explain and correctness_features methods. The main purpose of this class is to provide a structure for interpreting features in machine learning models how see if the way a model understands features is correct. It also provides a way to analyze the correctness of the analysis for every feature.
- Parameters:
x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values.
features (List[str]) – List of feature names to interpret.
model (Callable) – The machine learning model to explain.
global_explainer (bool) – Whether to use a global explainer. Defaults to True.
local_explainer (bool) – Whether to use a local explainer. Defaults to True.
- Raises:
ValueError – If x_train or y_train is None, or if features are not provided or not present in x_train.
ValueError – If any feature in features is not present in x_train.
ValueError – If no features are provided.
- abstractmethod explain(**kwargs)[source]
Interpret features for the machine learning model. This method should be implemented in subclasses to provide specific interpretations.
- Parameters:
**kwargs (Any) – Additional keyword arguments for customization.
- Returns:
This method does not return anything, it modifies the state of the explainer.
- Return type:
None
SHAP Wrapper
Shap Wrapper for Models.
- class mlexplainer.explainers.shap.wrapper.ShapWrapper(model, model_output='raw')[source]
Bases:
objectShapley’s values wrapper for models, based on TreeExplainer. This class is designed to calculate SHAP values for a given model and features in a DataFrame. It uses the TreeExplainer from the SHAP library to compute the SHAP values based on the model’s predictions.
- model
The model to be wrapped for SHAP value calculation.
- Type:
Callable
- shap_margin_explainer
The SHAP explainer instance.
- Type:
TreeExplainer
- __init__(model, model_output='raw')[source]
Initialize the ShapWrapper with a model.
- Parameters:
model (Callable) – The model to be wrapped for SHAP value calculation.
model_output (str) – The type of output from the model, e.g., “raw”, “probability”.
Binary Classification Explainer
BinaryMLExplainer for binary classification tasks. This module provides an implementation of the BaseMLExplainer for binary classification tasks, including methods to explain numerical and categorical features using SHAP values.
- class mlexplainer.explainers.shap.binary.BinaryMLExplainer(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]
Bases:
BaseMLExplainerBinaryMLExplainer for binary classification tasks. This class extends BaseMLExplainer to provide specific methods for explaining features in binary classification tasks using SHAP values. It includes methods to interpret numerical and categorical features, validate feature interpretations, and visualize global feature importance.
- __init__(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]
Initialize the BinaryMLExplainer with training data, features, and model.
- Parameters:
x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values.
features (List[str]) – List of feature names to interpret.
model (Callable) – The machine learning model to explain.
global_explainer (bool) – Whether to use a global explainer. Defaults to True.
local_explainer (bool) – Whether to use a local explainer. Defaults to True.
- Raises:
ValueError – If x_train or y_train is None, or if features are not provided or not present in x_train.
ValueError – If any feature in features is not present in x_train.
ValueError – If no features are provided.
- explain(features_to_explain=None, **kwargs)[source]
Explain the features for binary classification. This method interprets the features based on the training data and SHAP values. It visualizes global feature importance and interprets numerical and categorical features.
- Parameters:
features_to_explain (Union[list[str], None]) – List of feature names to explain.
**kwargs –
Additional keyword arguments for customization, such as: - figsize: Tuple for figure size (default: (15, 8)) - dpi: Dots per inch for the plot (default: 100) - q: Number of quantiles for plotting (default: 20) - threshold_nb_values: Threshold for number of unique values
in numerical features (default: 15)
Multilabel Classification Explainer
MultilabelMLExplainer for multilabel classification tasks. This module provides an implementation of the BaseMLExplainer for multilabel classification tasks, including methods to explain numerical and categorical features using SHAP values.
- class mlexplainer.explainers.shap.multilabel.MultilabelMLExplainer(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]
Bases:
BaseMLExplainerMultilabelMLExplainer for multilabel classification tasks.
- __init__(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]
Initialize the MultilabelMLExplainer with training data, features, and model.
- Parameters:
x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values (multilabel).
features (List[str]) – List of feature names to interpret.
model (Callable) – The machine learning model to explain.
global_explainer (bool) – Whether to use a global explainer. Defaults to True.
local_explainer (bool) – Whether to use a local explainer. Defaults to True.
- Raises:
ValueError – If x_train or y_train is None, or if features are not provided or not present in x_train.
ValueError – If any feature in features is not present in x_train.
ValueError – If no features are provided.
- explain(features_to_explain=None, **kwargs)[source]
Explain the features for multilabel classification. This method interprets the features based on the training data and SHAP values.
- Parameters:
**kwargs –
Additional keyword arguments for customization, such as: - figsize: Tuple for figure size (default: (15, 8)) - dpi: Dots per inch for the plot (default: 100) - q: Number of quantiles for plotting (default: 20) - threshold_nb_values: Threshold for number of values in categorical
features (default: 15)
Utilities
Data Processing
Utility functions for data processing in ML Explainer.
- mlexplainer.utils.data_processing.calculate_min_max_value(dataframe, feature)[source]
Calculate the minimum and maximum values of a feature in a DataFrame.
- mlexplainer.utils.data_processing.get_index(column_name, dataframe)[source]
Extract the index of a column in a DataFrame.
- mlexplainer.utils.data_processing.get_index_of_features(dataframe, feature)[source]
Get the index of a feature in the DataFrame columns.
- mlexplainer.utils.data_processing.target_groupby_category(dataframe, feature, target_serie)[source]
Group by a categorical feature and calculate mean and volume of the target.
- Parameters:
dataframe (DataFrame) – Input DataFrame containing the feature and target.
feature (str) – The feature name to group by.
target_serie (Series) – The target series to calculate statistics for.
- Returns:
DataFrame with mean and volume of the target for each group.
- Return type:
DataFrame
Quantiles
Utility functions for quantile calculations in ML Explainer.
- mlexplainer.utils.quantiles.is_in_quantile(value, quantile_values)[source]
Return the quantile of a value, given a list of quantiles.
- mlexplainer.utils.quantiles.nb_min_quantiles(x, q=None)[source]
Calculate the number of quantiles to use for a feature.
- mlexplainer.utils.quantiles.group_values(x, y, q, threshold_nb_values=15)[source]
Create a new DataFrame of cut values. This function groups the values of a feature into quantiles and computes the mean of the target variable for each group. It also counts the number of observations in each group.
- Parameters:
x (Series) – Feature values.
y (Series) – Target values.
q (int) – Number of quantiles.
- Returns:
Grouped values with statistics. int: Used quantiles.
- Return type:
DataFrame
Visualization
SHAP Plots
Module for plotting SHAP values in various formats. This module provides functions to visualize SHAP values for both numerical and categorical features, as well as for binary and multilabel classification tasks.
- mlexplainer.visualization.shap_plots.plot_shap_scatter(feature_values, shap_values, ax, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]
Plot a scatter plot of SHAP values. This function creates a scatter plot of SHAP values against feature values. The points are colored based on whether the SHAP value is positive or negative.
- Parameters:
feature_values (Series) – Values of the feature to plot.
shap_values (ndarray) – SHAP values to plot.
ax (Axes) – Matplotlib axis to plot on.
color_positive (tuple[float, float, float], optional) – Positive color. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Negative color. Defaults to (0.12, 0.53, 0.9).
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (int, optional) – Alpha transparency of the points. Defaults to 1.
s (int, optional) – Size of the points in the scatter plot. Defaults to 2.
annotate (bool, optional) – Whether to annotate the plot with text labels. Defaults to True.
- Returns:
Matplotlib axis with the scatter plot.
- Return type:
Axes
- mlexplainer.visualization.shap_plots.plot_shap_values_numerical_binary(x_train, feature, shap_values_train, delta, ymean_train, ax, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]
Plot SHAP values for a binary classification feature. This function creates a scatter plot of SHAP values against feature values for a binary classification task. It adjusts the y-axis limits to center around the mean of the target variable in the training set, and aligns the secondary y-axis (SHAP values) with the primary y-axis (mean target).
- Parameters:
x_train (DataFrame) – Training feature values.
feature (str) – The feature name to plot.
shap_values_train (ndarray) – SHAP values for the training features.
delta (float) – Delta value for adjusting plot limits.
ymean_train (float) – Mean of the target variable in the training set.
ax (
Axes) – Matplotlib axis to plot on.color_positive (tuple[float, float, float], optional) – Color for positive SHAP values. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values. Defaults to (0.12, 0.53, 0.9).
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (float, optional) – Alpha transparency of the points. Defaults to 1.0.
s (float, optional) – Size of the points in the scatter plot. Defaults to 2.0.
annotate (bool, optional) – Whether to annotate the plot with text labels. Defaults to True.
- Returns:
Matplotlib axes for the main plot and SHAP plot.
- Return type:
- mlexplainer.visualization.shap_plots.plot_shap_values_categorical_binary(x_train, feature, shap_values_train, ax, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]
Plot SHAP values for a categorical feature in a binary classification task. This function creates a scatter plot of SHAP values against feature values for a binary classification task. It adjusts the y-axis limits to center around the mean of the target variable in the training set, and aligns the secondary y-axis (SHAP values) with the primary y-axis (mean target).
- Parameters:
x_train (DataFrame) – Training feature values.
feature (str) – The feature name to plot.
shap_values_train (ndarray) – SHAP values for the training features.
ax (Axes) – Matplotlib axis to plot on.
color_positive (tuple[float, float, float], optional) – Color for positive SHAP values. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values. Defaults to (0.12, 0.53, 0.9).
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (float, optional) – Alpha transparency of the points. Defaults to 1.0.
s (float, optional) – Size of the points in the scatter plot. Defaults to 2.0.
annotate (bool, optional) – Whether to annotate the plot with text labels. Defaults to True.
- Returns:
Matplotlib axes for the main plot and SHAP plot.
- Return type:
- mlexplainer.visualization.shap_plots.plot_shap_values_numerical_multilabel(x_train, y_train, feature, shap_values_train, delta, axes, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]
Plot SHAP values for a numerical feature in a multilabel classification task. This function creates a scatter plot of SHAP values against feature values for each modality in the multilabel target. It adjusts the y-axis limits to center around the mean of the target variable in the training set for each modality, and aligns the secondary y-axis (SHAP values) with the primary y-axis (mean target).
- Parameters:
x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values (multilabel).
feature (str) – The feature name to plot.
shap_values_train (ndarray) – SHAP values for the training features.
delta (float) – Delta value for adjusting plot limits.
axes (ndarray) – Array of Matplotlib axes to plot on, one for each modality.
color_positive (tuple[float, float, float], optional) – Color for positive SHAP values.
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values.
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (float, optional) – Alpha transparency of the points. Defaults to 1.0.
s (float, optional) – Size of the points in the scatter plot. Defaults to 2.0.
annotate (bool, optional) – Whether to annotate the plot with text labels.
- Returns:
Array of Matplotlib axes with the scatter plots for each modality.
- Return type:
ndarray
- mlexplainer.visualization.shap_plots.plot_shap_values_categorical_multilabel(x_train, y_train, feature, shap_values_train, axes, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]
Plot SHAP values for a categorical feature in a multilabel classification task. This function creates a scatter plot of SHAP values against feature values for each modality in the multilabel target. It adjusts the y-axis limits to center around the mean of the target variable in the training set for each modality, and aligns the secondary y-axis (SHAP values) with the primary y-axis (mean target).
- Parameters:
x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values (multilabel).
feature (str) – The feature name to plot.
shap_values_train (ndarray) – SHAP values for the training features.
axes (ndarray) – Array of Matplotlib axes to plot on, one for each modality.
color_positive (tuple[float, float, float], optional) – Color for positive SHAP values. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values. Defaults to (0.12, 0.53, 0.9).
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (float, optional) – Alpha transparency of the points. Defaults to 1.0.
s (float, optional) – Size of the points in the scatter plot. Defaults to 2.0.
annotate (bool, optional) – Whether to annotate the plot with text labels.
- Returns:
Array of Matplotlib axes with the scatter plots for each modality.
- Return type:
ndarray
Target Plots
Plotting functions for SHAP explanations in binary classification tasks. This module provides functions to visualize the relationship between features and target variables, including numerical and categorical features, as well as handling missing values.
- mlexplainer.visualization.target_plots.creneau(x, xmin, xmax)[source]
Create a square wave with every group’s mean.
- Parameters:
x (pandas.DataFrame) – Input DataFrame with a ‘group’ column.
xmin (float) – Minimum value to replace NaNs after shifting.
xmax (float) – Maximum value to replace the max group value.
- Returns:
Transformed DataFrame with square wave pattern.
- Return type:
- mlexplainer.visualization.target_plots.add_nans(nan_observation, xmin, delta, ax, color)[source]
Plot missing values on the given axis.
- Parameters:
- Returns:
Matplotlib axis with missing values plotted.
- Return type:
Axes
- mlexplainer.visualization.target_plots.set_centered_ylim(ax, center)[source]
Set the y-axis limits centered around a specified value.
- Parameters:
ax (Axes) – Matplotlib axis to set limits for.
center (float) – Center value around which to set the limits.
- Returns:
Matplotlib axis with updated y-axis limits.
- Return type:
Axes
- mlexplainer.visualization.target_plots.reformat_y_axis(ax, color=(0.28, 0.18, 0.71))[source]
Refactor the y-axis of a plot. :type ax:
Axes:param ax: Matplotlib axis to refactor. :type ax: Axes :type color:tuple[float,float,float] :param color: Color for the y-axis label and ticks. :type color: tuple[float, float, float]- Returns:
Matplotlib axis with refactored y-axis.
- Return type:
Axes
- mlexplainer.visualization.target_plots.plot_feature_numerical_target(x, y, q, ax, delta, ymean, threshold_nb_values=15, color=(0.28, 0.18, 0.71))[source]
Plot the relationship between a feature and the target variable.
- mlexplainer.visualization.target_plots.plot_feature_target_numerical_binary(dataframe, target_serie, feature, q, ax, delta, threshold_nb_values=15, target_modality=None)[source]
Plot the relationship between a feature and the target variable for binary classification.
- Parameters:
dataframe (DataFrame) – DataFrame containing the feature and target variable.
target_serie (Series) – Series representing the target variable.
feature (str) – The feature name to plot.
q (int) – Number of quantiles.
ax (Axes) – Matplotlib axis to plot on.
delta (float) – Delta value for adjusting plot limits.
threshold_nb_values (float) – Threshold for number of unique values to
method. (decide grouping)
- Returns:
Matplotlib axis with the feature-target plot.
- Return type:
Axes
- mlexplainer.visualization.target_plots.plot_feature_target_categorical_binary(dataframe, target, feature, ax, color=(0.28, 0.18, 0.71), target_modality=None)[source]
Plot the relationship between a categorical feature and the target variable for binary classification.
- Parameters:
- Returns:
Matplotlib axis with the feature-target plot.
- Return type:
Axes
- mlexplainer.visualization.target_plots.plot_feature_target_numerical_multilabel(dataframe, target_serie, feature, q=20, delta=0.1, figsize=(15, 8), dpi=100, threshold_nb_values=15)[source]
Plot the relationship between a numerical feature and all target modalities with SHAP values.
- Parameters:
dataframe (DataFrame) – DataFrame containing the feature and target variable.
target_serie (Series) – Series representing the target variable.
feature (str) – The feature name to plot.
modalities (list) – List of unique target modalities.
shap_values (ndarray, optional) – SHAP values for the training features.
q (int, optional) – Number of quantiles. Defaults to 20.
figsize (tuple, optional) – Figure size for the plot. Defaults to (15, 8).
dpi (int, optional) – Dots per inch for the plot. Defaults to 100.
- mlexplainer.visualization.target_plots.plot_feature_target_categorical_multilabel(dataframe, target_serie, feature, modalities, figsize=(15, 8), dpi=200, color=(0.28, 0.18, 0.71))[source]
Plot the relationship between a categorical feature and all target modalities with SHAP values.
- Parameters:
dataframe (DataFrame) – DataFrame containing the feature and target variable.
target_serie (Series) – Series representing the target variable.
feature (str) – The feature name to plot.
modalities (list) – List of unique target modalities.
shap_values (ndarray, optional) – SHAP values for the training features.
figsize (tuple, optional) – Figure size for the plot. Defaults to (15, 8).
dpi (int, optional) – Dots per inch for the plot. Defaults to 200.