deepof.post_hoc.condition_distance_binning

deepof.post_hoc.condition_distance_binning(embedding: deepof_table_dict, soft_counts: deepof_table_dict, breaks: deepof_table_dict, exp_conditions: dict, start_bin: int | None = None, end_bin: int | None = None, step_bin: int | None = None, scan_mode: str = 'growing_window', precomputed_bins: ndarray | None = None, agg: str = 'mean', metric: str = 'auc', n_jobs: int = 2)

Compute the distance between the embeddings of two conditions, using the specified aggregation method.

Parameters:
  • embedding (TableDict) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.

  • soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.

  • breaks (TableDict) – A dictionary of breaks, where the keys are the names of the experimental conditions, and the values are the breaks for each condition.

  • exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.

  • start_bin (int) – The index of the first bin to compute the distance for.

  • end_bin (int) – The index of the last bin to compute the distance for.

  • step_bin (int) – The step size of the bins to compute the distance for.

  • scan_mode (str) – The mode to use for computing the distance. Can be one of “growing-window” (used to select optimal binning), “per-bin” (used to evaluate how discriminability evolves in subsequent bins of a specified size) or “precomputed”, which requires a numpy ndarray with bin IDs to be passed to precomputed_bins.

  • precomputed_bins (np.ndarray) – numpy array with IDs mapping to different bins, not necessarily having the same size. Difference across conditions for each of these bins will be reported.

  • agg (str) – The aggregation method to use. Can be either “mean”, “median”, or “time_on_cluster”.

  • metric (str) – The distance metric to use. Can be either “auc” (where the reported ‘distance’ is based on performance of a classifier when separating aggregated embeddings), or “wasserstein” (which computes distances based on optimal transport).

  • n_jobs (int) – The number of jobs to use for parallel processing.

Returns:

An array with distances between conditions across the resulting time bins