Explainers

This section provides detailed documentation for all explainer classes in MLExplainer.

SHAP Explainers

Base Explainer

Base class for Machine Learning Explainers. This class is designed to be subclassed for specific machine learning models.

class mlexplainer.core.base_explainer.BaseMLExplainer(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]

Bases: ABC

Base class for Machine Learning Explainers.

This class provides a structure for interpreting features in machine learning models and analyzing the correctness of the analysis for every feature.

x_train

Training feature values.

Type:: DataFrame

y_train

Training target values.

Type:: Series

features

List of feature names to interpret.

Type:: List[str]

model

The machine learning model to explain.

Type:: Callable

global_explainer

Whether to use a global explainer.

Type:: bool

local_explainer

Whether to use a local explainer.

Type:: bool

__init__(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]

Initialize the BaseMLExplainer with training data, features, and model. This class is designed to be subclassed for specific machine learning models and should implement the explain and correctness_features methods. The main purpose of this class is to provide a structure for interpreting features in machine learning models how see if the way a model understands features is correct. It also provides a way to analyze the correctness of the analysis for every feature.

Parameters:

x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values.
features (List[str]) – List of feature names to interpret.
model (Callable) – The machine learning model to explain.
global_explainer (bool) – Whether to use a global explainer. Defaults to True.
local_explainer (bool) – Whether to use a local explainer. Defaults to True.

Raises:

ValueError – If x_train or y_train is None, or if features are not provided or not present in x_train.
ValueError – If any feature in features is not present in x_train.
ValueError – If no features are provided.

abstractmethod explain(**kwargs)[source]

Interpret features for the machine learning model. This method should be implemented in subclasses to provide specific interpretations.

Parameters:: **kwargs (Any) – Additional keyword arguments for customization.
Returns:: This method does not return anything, it modifies the state of the explainer.
Return type:: None

abstractmethod correctness_features(q=None)[source]

Analyze the correctness of the analysis for every feature. This method validates interpretation consistency between actual target rates and SHAP values for all features in the explainer.

Parameters:: q (Optional[int]) – Number of quantiles for continuous features. If None, uses adaptive quantiles. Defaults to None.
Returns:: Dictionary with feature names as keys and correctness results as values.
Return type:: dict

SHAP Wrapper

Shap Wrapper for Models.

class mlexplainer.explainers.shap.wrapper.ShapWrapper(model, model_output='raw')[source]

Bases: object

Shapley’s values wrapper for models, based on TreeExplainer. This class is designed to calculate SHAP values for a given model and features in a DataFrame. It uses the TreeExplainer from the SHAP library to compute the SHAP values based on the model’s predictions.

model

The model to be wrapped for SHAP value calculation.

Type:: Callable

model_output

The type of output from the model, e.g., “raw”, “probability”.

Type:: str

shap_margin_explainer

The SHAP explainer instance.

Type:: TreeExplainer

__init__(model, model_output='raw')[source]

Initialize the ShapWrapper with a model.

Parameters:

model (Callable) – The model to be wrapped for SHAP value calculation.
model_output (str) – The type of output from the model, e.g., “raw”, “probability”.

calculate(dataframe, features)[source]

Calculate SHAP values for the given model and dataframe.

Parameters:

dataframe (DataFrame) – The input DataFrame containing features.
features (list[str]) – List of feature names to calculate SHAP values for.

Returns:

A DataFrame containing SHAP values for the specified features.

Return type:

DataFrame

Binary Classification Explainer

BinaryMLExplainer for binary classification tasks. This module provides an implementation of the BaseMLExplainer for binary classification tasks, including methods to explain numerical and categorical features using SHAP values.

class mlexplainer.explainers.shap.binary.BinaryMLExplainer(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]

Bases: BaseMLExplainer

BinaryMLExplainer for binary classification tasks. This class extends BaseMLExplainer to provide specific methods for explaining features in binary classification tasks using SHAP values. It includes methods to interpret numerical and categorical features, validate feature interpretations, and visualize global feature importance.

__init__(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]

Initialize the BinaryMLExplainer with training data, features, and model.

Parameters:

x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values.
features (List[str]) – List of feature names to interpret.
model (Callable) – The machine learning model to explain.
global_explainer (bool) – Whether to use a global explainer. Defaults to True.
local_explainer (bool) – Whether to use a local explainer. Defaults to True.

Raises:

ValueError – If x_train or y_train is None, or if features are not provided or not present in x_train.
ValueError – If any feature in features is not present in x_train.
ValueError – If no features are provided.

explain(features_to_explain=None, **kwargs)[source]

Explain the features for binary classification. This method interprets the features based on the training data and SHAP values. It visualizes global feature importance and interprets numerical and categorical features.

Parameters:

features_to_explain (Union[list[str], None]) – List of feature names to explain.
**kwargs –
Additional keyword arguments for customization, such as: - figsize: Tuple for figure size (default: (15, 8)) - dpi: Dots per inch for the plot (default: 100) - q: Number of quantiles for plotting (default: 20) - threshold_nb_values: Threshold for number of unique values

in numerical features (default: 15)

correctness_features(q=None)[source]

Analyze the correctness of the analysis for every feature.

This method validates interpretation consistency between actual target rates and SHAP values for all features in the explainer.

Parameters:: q (int) – Number of quantiles for continuous features. If None, uses adaptive quantiles. Defaults to None.
Returns:: Dictionary with feature names as keys and correctness results as values.
Return type:: dict

Multilabel Classification Explainer

MultilabelMLExplainer for multilabel classification tasks. This module provides an implementation of the BaseMLExplainer for multilabel classification tasks, including methods to explain numerical and categorical features using SHAP values.

class mlexplainer.explainers.shap.multilabel.MultilabelMLExplainer(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]

Bases: BaseMLExplainer

MultilabelMLExplainer for multilabel classification tasks.

__init__(x_train, y_train, features, model, global_explainer=True, local_explainer=True)[source]

Initialize the MultilabelMLExplainer with training data, features, and model.

Parameters:

x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values (multilabel).
features (List[str]) – List of feature names to interpret.
model (Callable) – The machine learning model to explain.
global_explainer (bool) – Whether to use a global explainer. Defaults to True.
local_explainer (bool) – Whether to use a local explainer. Defaults to True.

Raises:

ValueError – If x_train or y_train is None, or if features are not provided or not present in x_train.
ValueError – If any feature in features is not present in x_train.
ValueError – If no features are provided.

explain(features_to_explain=None, **kwargs)[source]

Explain the features for multilabel classification. This method interprets the features based on the training data and SHAP values.

Parameters:

**kwargs –

Additional keyword arguments for customization, such as: - figsize: Tuple for figure size (default: (15, 8)) - dpi: Dots per inch for the plot (default: 100) - q: Number of quantiles for plotting (default: 20) - threshold_nb_values: Threshold for number of values in categorical

features (default: 15)

correctness_features(q=None)[source]

Analyze the correctness of the analysis for every feature.

This method validates interpretation consistency between actual target rates and SHAP values for all features in the explainer for multilabel classification.

Parameters:

q (int) – Number of quantiles for continuous features. If None, uses adaptive quantiles. Defaults to None.

Returns:

Dictionary with feature names as keys and modality-specific: correctness results as values.

Return type:

dict

Utilities

Data Processing

Utility functions for data processing in ML Explainer.

mlexplainer.utils.data_processing.calculate_min_max_value(dataframe, feature)[source]

Calculate the minimum and maximum values of a feature in a DataFrame.

Parameters:

dataframe (DataFrame) – The DataFrame containing the feature.
feature (str) – The name of the feature to calculate min and max values.

Returns:

A tuple containing the minimum and maximum values of the feature.

Return type:

tuple

mlexplainer.utils.data_processing.get_index(column_name, dataframe)[source]

Extract the index of a column in a DataFrame.

Parameters:

column_name (str) – Column name to extract the index from.
dataframe (DataFrame) – DataFrame to extract the index from.

Returns:

Index of the column in the DataFrame.

Return type:

int

mlexplainer.utils.data_processing.get_index_of_features(dataframe, feature)[source]

Get the index of a feature in the DataFrame columns.

Parameters:

dataframe (DataFrame) – DataFrame containing the features.
feature (str) – The feature name to find the index of.

Returns:

Index of the feature in the DataFrame columns.

Return type:

int

mlexplainer.utils.data_processing.target_groupby_category(dataframe, feature, target_serie)[source]

Group by a categorical feature and calculate mean and volume of the target.

Parameters:

dataframe (DataFrame) – Input DataFrame containing the feature and target.
feature (str) – The feature name to group by.
target_serie (Series) – The target series to calculate statistics for.

Returns:

DataFrame with mean and volume of the target for each group.

Return type:

DataFrame

Quantiles

Utility functions for quantile calculations in ML Explainer.

mlexplainer.utils.quantiles.is_in_quantile(value, quantile_values)[source]

Return the quantile of a value, given a list of quantiles.

Parameters:

value (int) – Search value.
quantile_values (list[Union[int, float]]) – List of quantiles.

Returns:

Upper bound of the quantile.

Return type:

Union[int, float]

mlexplainer.utils.quantiles.nb_min_quantiles(x, q=None)[source]

Calculate the number of quantiles to use for a feature.

Parameters:

x (DataFrame) – DataFrame to calculate the number of quantiles for.
q (int, optional) – Number of quantiles. Defaults to None.

Returns:

Final number of quantiles to use.

Return type:

int

mlexplainer.utils.quantiles.group_values(x, y, q, threshold_nb_values=15)[source]

Create a new DataFrame of cut values. This function groups the values of a feature into quantiles and computes the mean of the target variable for each group. It also counts the number of observations in each group.

Parameters:

x (Series) – Feature values.
y (Series) – Target values.
q (int) – Number of quantiles.

Returns:

Grouped values with statistics. int: Used quantiles.

Return type:

DataFrame

Visualization

SHAP Plots

Module for plotting SHAP values in various formats. This module provides functions to visualize SHAP values for both numerical and categorical features, as well as for binary and multilabel classification tasks.

mlexplainer.visualization.shap_plots.plot_shap_scatter(feature_values, shap_values, ax, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]

Plot a scatter plot of SHAP values. This function creates a scatter plot of SHAP values against feature values. The points are colored based on whether the SHAP value is positive or negative.

Parameters:

feature_values (Series) – Values of the feature to plot.
shap_values (ndarray) – SHAP values to plot.
ax (Axes) – Matplotlib axis to plot on.
color_positive (tuple[float, float, float], optional) – Positive color. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Negative color. Defaults to (0.12, 0.53, 0.9).
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (int, optional) – Alpha transparency of the points. Defaults to 1.
s (int, optional) – Size of the points in the scatter plot. Defaults to 2.
annotate (bool, optional) – Whether to annotate the plot with text labels. Defaults to True.

Returns:

Matplotlib axis with the scatter plot.

Return type:

Axes

mlexplainer.visualization.shap_plots.plot_shap_values_numerical_binary(x_train, feature, shap_values_train, delta, ymean_train, ax, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]

Plot SHAP values for a binary classification feature. This function creates a scatter plot of SHAP values against feature values for a binary classification task. It adjusts the y-axis limits to center around the mean of the target variable in the training set, and aligns the secondary y-axis (SHAP values) with the primary y-axis (mean target).

Parameters:

x_train (DataFrame) – Training feature values.
feature (str) – The feature name to plot.
shap_values_train (ndarray) – SHAP values for the training features.
delta (float) – Delta value for adjusting plot limits.
ymean_train (float) – Mean of the target variable in the training set.
ax (Axes) – Matplotlib axis to plot on.
color_positive (tuple[float, float, float], optional) – Color for positive SHAP values. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values. Defaults to (0.12, 0.53, 0.9).
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (float, optional) – Alpha transparency of the points. Defaults to 1.0.
s (float, optional) – Size of the points in the scatter plot. Defaults to 2.0.
annotate (bool, optional) – Whether to annotate the plot with text labels. Defaults to True.

Returns:

Matplotlib axes for the main plot and SHAP plot.

Return type:

tuple

mlexplainer.visualization.shap_plots.plot_shap_values_categorical_binary(x_train, feature, shap_values_train, ax, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]

Plot SHAP values for a categorical feature in a binary classification task. This function creates a scatter plot of SHAP values against feature values for a binary classification task. It adjusts the y-axis limits to center around the mean of the target variable in the training set, and aligns the secondary y-axis (SHAP values) with the primary y-axis (mean target).

Parameters:

x_train (DataFrame) – Training feature values.
feature (str) – The feature name to plot.
shap_values_train (ndarray) – SHAP values for the training features.
ax (Axes) – Matplotlib axis to plot on.
color_positive (tuple[float, float, float], optional) – Color for positive SHAP values. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values. Defaults to (0.12, 0.53, 0.9).
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (float, optional) – Alpha transparency of the points. Defaults to 1.0.
s (float, optional) – Size of the points in the scatter plot. Defaults to 2.0.
annotate (bool, optional) – Whether to annotate the plot with text labels. Defaults to True.

Returns:

Matplotlib axes for the main plot and SHAP plot.

Return type:

tuple

mlexplainer.visualization.shap_plots.plot_shap_values_numerical_multilabel(x_train, y_train, feature, shap_values_train, delta, axes, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]

Plot SHAP values for a numerical feature in a multilabel classification task. This function creates a scatter plot of SHAP values against feature values for each modality in the multilabel target. It adjusts the y-axis limits to center around the mean of the target variable in the training set for each modality, and aligns the secondary y-axis (SHAP values) with the primary y-axis (mean target).

Parameters:

x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values (multilabel).
feature (str) – The feature name to plot.
shap_values_train (ndarray) – SHAP values for the training features.
delta (float) – Delta value for adjusting plot limits.
axes (ndarray) – Array of Matplotlib axes to plot on, one for each modality.
color_positive (tuple[float, float, float], optional) – Color for positive SHAP values.
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values.
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (float, optional) – Alpha transparency of the points. Defaults to 1.0.
s (float, optional) – Size of the points in the scatter plot. Defaults to 2.0.
annotate (bool, optional) – Whether to annotate the plot with text labels.

Returns:

Array of Matplotlib axes with the scatter plots for each modality.

Return type:

ndarray

mlexplainer.visualization.shap_plots.plot_shap_values_categorical_multilabel(x_train, y_train, feature, shap_values_train, axes, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9), marker='o', alpha=1.0, s=2.0, annotate=True)[source]

Plot SHAP values for a categorical feature in a multilabel classification task. This function creates a scatter plot of SHAP values against feature values for each modality in the multilabel target. It adjusts the y-axis limits to center around the mean of the target variable in the training set for each modality, and aligns the secondary y-axis (SHAP values) with the primary y-axis (mean target).

Parameters:

x_train (DataFrame) – Training feature values.
y_train (Series) – Training target values (multilabel).
feature (str) – The feature name to plot.
shap_values_train (ndarray) – SHAP values for the training features.
axes (ndarray) – Array of Matplotlib axes to plot on, one for each modality.
color_positive (tuple[float, float, float], optional) – Color for positive SHAP values. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values. Defaults to (0.12, 0.53, 0.9).
marker (str, optional) – Marker style for the scatter plot. Defaults to “o”.
alpha (float, optional) – Alpha transparency of the points. Defaults to 1.0.
s (float, optional) – Size of the points in the scatter plot. Defaults to 2.0.
annotate (bool, optional) – Whether to annotate the plot with text labels.

Returns:

Array of Matplotlib axes with the scatter plots for each modality.

Return type:

ndarray

mlexplainer.visualization.shap_plots.colorize_yticklabel_shap(ax, color_positive=(1.0, 0.5, 0.34), color_negative=(0.12, 0.53, 0.9))[source]

Colorize the y-tick labels of a SHAP plot based on their values.

Parameters:

ax (Axes) – Matplotlib axis to modify.
color_positive (tuple[float, float, float], optional) – Color for positive SHAP values. Defaults to (1.0, 0.5, 0.34).
color_negative (tuple[float, float, float], optional) – Color for negative SHAP values. Defaults to (0.12, 0.53, 0.9).

Returns:

Matplotlib axis with colored y-tick labels.

Return type:

Axes

Target Plots

Plotting functions for SHAP explanations in binary classification tasks. This module provides functions to visualize the relationship between features and target variables, including numerical and categorical features, as well as handling missing values.

mlexplainer.visualization.target_plots.creneau(x, xmin, xmax)[source]

Create a square wave with every group’s mean.

Parameters:

x (pandas.DataFrame) – Input DataFrame with a ‘group’ column.
xmin (float) – Minimum value to replace NaNs after shifting.
xmax (float) – Maximum value to replace the max group value.

Returns:

Transformed DataFrame with square wave pattern.

Return type:

pandas.DataFrame

mlexplainer.visualization.target_plots.add_nans(nan_observation, xmin, delta, ax, color)[source]

Plot missing values on the given axis.

Parameters:

nan_observation (DataFrame) – DataFrame containing missing values.
xmin (float) – Minimum value to replace NaNs after shifting.
delta (float) – Delta value for adjusting plot limits.
ax – Matplotlib axis to plot on.
color (tuple) – Color for plotting missing values.

Returns:

Matplotlib axis with missing values plotted.

Return type:

Axes

mlexplainer.visualization.target_plots.set_centered_ylim(ax, center)[source]

Set the y-axis limits centered around a specified value.

Parameters:

ax (Axes) – Matplotlib axis to set limits for.
center (float) – Center value around which to set the limits.

Returns:

Matplotlib axis with updated y-axis limits.

Return type:

Axes

mlexplainer.visualization.target_plots.reformat_y_axis(ax, color=(0.28, 0.18, 0.71))[source]

Refactor the y-axis of a plot. :type ax: Axes :param ax: Matplotlib axis to refactor. :type ax: Axes :type color: tuple[float, float, float] :param color: Color for the y-axis label and ticks. :type color: tuple[float, float, float]

Returns:: Matplotlib axis with refactored y-axis.
Return type:: Axes

mlexplainer.visualization.target_plots.plot_feature_numerical_target(x, y, q, ax, delta, ymean, threshold_nb_values=15, color=(0.28, 0.18, 0.71))[source]

Plot the relationship between a feature and the target variable.

Parameters:

x (Series) – Feature values.
y (Series) – Target values.
q (int) – Number of quantiles.
ax – Matplotlib axis to plot on.
delta (float) – Delta value for adjusting plot limits.
ymean (float) – Mean of the target variable.

Returns:

Matplotlib axis and used quantiles.

Return type:

tuple

mlexplainer.visualization.target_plots.plot_feature_target_numerical_binary(dataframe, target_serie, feature, q, ax, delta, threshold_nb_values=15, target_modality=None)[source]

Plot the relationship between a feature and the target variable for binary classification.

Parameters:

dataframe (DataFrame) – DataFrame containing the feature and target variable.
target_serie (Series) – Series representing the target variable.
feature (str) – The feature name to plot.
q (int) – Number of quantiles.
ax (Axes) – Matplotlib axis to plot on.
delta (float) – Delta value for adjusting plot limits.
threshold_nb_values (float) – Threshold for number of unique values to
method. (decide grouping)

Returns:

Matplotlib axis with the feature-target plot.

Return type:

Axes

mlexplainer.visualization.target_plots.plot_feature_target_categorical_binary(dataframe, target, feature, ax, color=(0.28, 0.18, 0.71), target_modality=None)[source]

Plot the relationship between a categorical feature and the target variable for binary classification.

Parameters:

dataframe (DataFrame) – DataFrame containing the feature and target variable.
target (Series) – Series representing the target variable.
feature (str) – The feature name to plot.
ax (Axes) – Matplotlib axis to plot on.
color (tuple[float, float, float]) – Color for the plot.

Returns:

Matplotlib axis with the feature-target plot.

Return type:

Axes

mlexplainer.visualization.target_plots.plot_feature_target_numerical_multilabel(dataframe, target_serie, feature, q=20, delta=0.1, figsize=(15, 8), dpi=100, threshold_nb_values=15)[source]

Plot the relationship between a numerical feature and all target modalities with SHAP values.

Parameters:

dataframe (DataFrame) – DataFrame containing the feature and target variable.
target_serie (Series) – Series representing the target variable.
feature (str) – The feature name to plot.
modalities (list) – List of unique target modalities.
shap_values (ndarray, optional) – SHAP values for the training features.
q (int, optional) – Number of quantiles. Defaults to 20.
figsize (tuple, optional) – Figure size for the plot. Defaults to (15, 8).
dpi (int, optional) – Dots per inch for the plot. Defaults to 100.

mlexplainer.visualization.target_plots.plot_feature_target_categorical_multilabel(dataframe, target_serie, feature, modalities, figsize=(15, 8), dpi=200, color=(0.28, 0.18, 0.71))[source]

Plot the relationship between a categorical feature and all target modalities with SHAP values.

Parameters:

dataframe (DataFrame) – DataFrame containing the feature and target variable.
target_serie (Series) – Series representing the target variable.
feature (str) – The feature name to plot.
modalities (list) – List of unique target modalities.
shap_values (ndarray, optional) – SHAP values for the training features.
figsize (tuple, optional) – Figure size for the plot. Defaults to (15, 8).
dpi (int, optional) – Dots per inch for the plot. Defaults to 200.