deepof.utils.gmm_model_selection

deepof.utils.gmm_model_selection(x: DataFrame, n_components_range: range, part_size: int, n_runs: int = 100, n_cores: int = False, cv_types: Tuple = ('spherical', 'tied', 'diag', 'full')) Tuple[List[list], List[ndarray], int | Any]

Run GMM clustering model selection on the specified X dataframe.

Outputs the bic distribution per model, a vector with the median BICs and an object with the overall best model.

Parameters:
  • x (pandas.DataFrame) – Data matrix to train the models

  • n_components_range (range) – Generator with numbers of components to evaluate

  • n_runs (int) – Number of bootstraps for each model

  • part_size (int) – Size of bootstrap samples for each model

  • n_cores (int) – Number of cores to use for computation

  • cv_types (tuple) – Covariance Matrices to try. All four available by default

Returns:

All recorded BIC values for all attempted parameter combinations (useful for plotting). - m_bic(list): All minimum BIC values recorded throughout the process (useful for plottinh). - best_bic_gmm (sklearn.GMM): Unfitted version of the best found model.

Return type:

  • bic (list)