deepof.post_hoc.condition_distance_binning
- deepof.post_hoc.condition_distance_binning(embedding: deepof_table_dict, soft_counts: deepof_table_dict, breaks: deepof_table_dict, exp_conditions: dict, start_bin: int | None = None, end_bin: int | None = None, step_bin: int | None = None, scan_mode: str = 'growing_window', precomputed_bins: ndarray | None = None, agg: str = 'mean', metric: str = 'auc', n_jobs: int = 2)
Compute the distance between the embeddings of two conditions, using the specified aggregation method.
- Parameters:
embedding (TableDict) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
breaks (TableDict) – A dictionary of breaks, where the keys are the names of the experimental conditions, and the values are the breaks for each condition.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
start_bin (int) – The index of the first bin to compute the distance for.
end_bin (int) – The index of the last bin to compute the distance for.
step_bin (int) – The step size of the bins to compute the distance for.
scan_mode (str) – The mode to use for computing the distance. Can be one of “growing-window” (used to select optimal binning), “per-bin” (used to evaluate how discriminability evolves in subsequent bins of a specified size) or “precomputed”, which requires a numpy ndarray with bin IDs to be passed to precomputed_bins.
precomputed_bins (np.ndarray) – numpy array with IDs mapping to different bins, not necessarily having the same size. Difference across conditions for each of these bins will be reported.
agg (str) – The aggregation method to use. Can be either “mean”, “median”, or “time_on_cluster”.
metric (str) – The distance metric to use. Can be either “auc” (where the reported ‘distance’ is based on performance of a classifier when separating aggregated embeddings), or “wasserstein” (which computes distances based on optimal transport).
n_jobs (int) – The number of jobs to use for parallel processing.
- Returns:
An array with distances between conditions across the resulting time bins