deepof.post_hoc.explain_clusters

deepof.post_hoc.explain_clusters(chunk_stats: DataFrame, hard_counts: ndarray, full_cluster_clf: Pipeline, samples: int = 10000, n_jobs: int = -1)

Compute SHAP feature importance for models mapping chunk_stats to cluster assignments.

Parameters:
  • chunk_stats (pd.DataFrame) – matrix with statistics per chunk, sorted by experiment.

  • hard_counts (np.ndarray) – cluster assignments for the corresponding ‘chunk_stats’ table.

  • full_cluster_clf (imblearn.pipeline.Pipeline) – trained supervised model on the full dataset, mapping chunk stats to cluster assignments.

  • samples (int) – number of samples to draw from the original chunk_stats dataset.

  • n_jobs (int) – number of parallel jobs to run. If -1 (default), all CPUs are used.

Returns:

shap_values per cluster. explainer (shap.explainers._kernel.Kernel): trained SHAP KernelExplainer.

Return type:

shap_values (list)