deepof package
Submodules
deepof.annotation_utils module
Functions and general utilities for supervised pose estimation. See documentation for details.
- deepof.annotation_utils.close_single_contact(pos_dframe: DataFrame, left: str, right: str, tol: float) array
Return a boolean array that’s True if the specified body parts are closer than tol.
- Parameters:
pos_dframe (pandas.DataFrame) – DLC output as pandas.DataFrame; only applicable to two-animal experiments.
left (string) – First member of the potential contact
right (string) – Second member of the potential contact
tol (float) – maximum distance for which a contact is reported
- Returns:
True if the distance between the two specified points is less than tol, False otherwise
- Return type:
contact_array (np.array)
- deepof.annotation_utils.close_double_contact(pos_dframe: DataFrame, left1: str, left2: str, right1: str, right2: str, rel_tol: float, rev: bool = False) array
Return a boolean array that’s True if the specified body parts are closer than tol.
- Parameters:
pos_dframe (pandas.DataFrame) – DLC output as pandas.DataFrame; only applicable to two-animal experiments.
#left_len (float) – Length of animal 1
left1 (string) – First contact point of animal 1
left2 (string) – Second contact point of animal 1
#right_len (float) – Length of animal 2
right1 (string) – First contact point of animal 2
right2 (string) – Second contact point of animal 2
rel_tol (float) – relative shar which affects the maximum distance for which a contact is reported
rev (bool) – reverses the default behaviour (nose2tail contact for both mice)
- Returns:
True if the distance between the two specified points is less than tol, False otherwise
- Return type:
double_contact (np.array)
- deepof.annotation_utils.rotate(origin, point, ang)
Auxiliar function to climb_wall and sniff_object. Rotates x,y coordinates over a pivot.
- Parameters:
() (ang)
()
()
- Returns:
qy ():
- Return type:
qx ()
- deepof.annotation_utils.outside_ellipse(x, y, e_center, e_axes, e_angle, threshold=0.0)
Auxiliar function to climb_wall and sniff_object.
Returns True if the passed x, y coordinates are outside the ellipse denoted by e_center, e_axes and e_angle, with a certain threshold
- deepof.annotation_utils.climb_arena(arena_type: str, arena: array, pos_dict: DataFrame, rel_tol: float, id: str, mouse_len: 50, centered_data: bool = False, run_numba: bool = False) array
Return True if the specified mouse is climbing the wall.
- Parameters:
arena_type (str) – arena type; must be one of [‘polygonal-manual’, ‘circular-autodetect’]
arena (np.array) – contains arena location and shape details
pos_dict (table_dict) – position over time for all videos in a project
rel_tol (float) – relative tolerance (to mouse length) to report a hit
id (str) – indicates the id + subcondition of the animal
centered_data (bool) – indicates whether the input data is centered
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)
- Returns:
boolean array. True if selected animal is climbing the walls of the arena
- Return type:
climbing (np.array)
- deepof.annotation_utils.sniff_object(speed_dframe: DataFrame, arena: array, pos_dict: DataFrame, tol: float, tol_speed: float, nose: str, center_name: str = 'Center', centered_data: bool = False, s_object: str = 'arena', animal_id: str = '', run_numba: bool = False)
Return True if the specified mouse is sniffing an object.
- Parameters:
speed_dframe (pandas.DataFrame) – speed of body parts over time.
arena (np.array) – contains arena location and shape details.
pos_dict (table_dict) – position over time for all videos in a project.
tol (float) – minimum tolerance to report a hit.
tol_speed (float) – minimum speed to report a hit.
center_name (str) – Body part to center coordinates on. “Center” by default.
nose (str) – indicates the name of the body part representing the nose of the selected animal.
centered_data (bool) – indicates whether the input data is centered.
s_object (str) – indicates the object to sniff. Must be one of [‘arena’, ‘object’].
animal_id (str) – indicates the animal to sniff. Must be one of animal_ids.
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)
- Returns:
boolean array. True if selected animal is sniffing the selected object
- Return type:
sniffing (np.array)
- deepof.annotation_utils.immobility(X_huddle: ndarray, huddle_estimator: Pipeline, animal_id: str = '', median_filter_width: int = 11, min_immobility: int = 25, max_immobility: int = 3000) array
Return true when the mouse is huddling a pretrained model.
- Parameters:
X_huddle (pandas.DataFrame) – mouse features over time.
huddle_estimator (sklearn.pipeline.Pipeline) – pre-trained model to predict feature occurrence.
animal_id (str) – indicates the animal to sniff. Must be one of animal_ids.
median_filter_width (int) – width of median filter for smoothing results
min_immobility (int) – minimum length of behavior to be considered immobility
max_immobility (int) – maximum length of behavior to be considered immobility (longer is labeled as “sleeping”)
- Returns:
1 if the animal is huddling, 0 otherwise
- Return type:
y_huddle (np.array)
- deepof.annotation_utils.augment_with_neighbors(X_huddle, window=5, step=1, window_out=11)
Expands a given set of features with leading and lagging features on the time axis. Will only return speed based features.
- Parameters:
X_huddle (pandas.DataFrame) – mouse features over time.
window (int) – steps to go forward and backward in time for each feature
step (int) – step size for the window
window_out (int) – total length of the output window
- Returns:
mouse features over time including leading and lagging features (only speed features) for each frame
- Return type:
X_augmented (pandas.DataFrame)
- deepof.annotation_utils.digging(speed_dframe: DataFrame, dist_dframe: DataFrame, likelihood_dframe: DataFrame, mouse_identity: str, close_range: ndarray, tol_speed: float, tol_likelihood: float, min_length: int, center_name: str = 'Center', animal_id: str = '')
Return true when the mouse is digging. Experimental and currently not included.
- Parameters:
speed_dframe (pandas.DataFrame) – speed of body parts over time
dist_dframe (pandas.DataFrame) – distance between body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
mouse_identity (str) – animal id without the _
close_range (np.ndarray) – boolean array that denotes if the nose of the current mouse is close to any other mouse for each frame.
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
min_length (int) – minimum length that True segments need to have to not get filtered out.
center_name (str) – Body part to center coordinates on. “Center” by default.
animal_id (str) – ID of the current animal.
- Returns:
True if the animal is standing still and is active, False otherwise stationary_passive (np.array): True if the animal is standing still and is passive, False otherwise
- Return type:
stationary_active (np.array)
- deepof.annotation_utils.stationary_lookaround(speed_dframe: DataFrame, dist_dframe: DataFrame, likelihood_dframe: DataFrame, mouse_identity: str, close_range: ndarray, tol_speed: float, tol_likelihood: float, min_length: int, animal_id: str = '')
Return true when the mouse is standing still and looking around (moving nose without head being tilted too much).
- Design considerations:
Detecting immobility and activity is relatively straightforward by mostly just checking speed thresholds on bodyparts. The main problem arises from getting a lot of “flickering” out of the detections, as bodyparts from frame to frame may be just above or below that threshold. Respectively most of the detect_activity algorithm is a series of filtering steps to alternatingly smooth the predictions and sharpening the edges of predicted behavior.
- Parameters:
speed_dframe (pandas.DataFrame) – speed of body parts over time
dist_dframe (pandas.DataFrame) – distance between body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
mouse_identity (str) – animal id without the _
close_range (np.ndarray) – boolean array that denotes if the nose of the current mouse is close to any other mouse for each frame.
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
min_length (int) – minimum length that True segments need to have to not get filtered out.
animal_id (str) – ID of the current animal.
- Returns:
True if the animal is standing still and looking around (moving nose without head being tilted too much), False otherwise
- Return type:
stationary_lookaround (np.array)
- deepof.annotation_utils.detect_activity(speed_dframe: DataFrame, likelihood_dframe: DataFrame, tol_speed: float, tol_likelihood: float, min_length: int, center_name: str = 'Center', animal_id: str = '')
Return true when the mouse is either moving (moving), standing still and either moving (active) or not moving (passive).
- Design considerations:
Detecting immobility and activity is relatively straightforward by mostly just checking speed thresholds on bodyparts. The main problem arises from getting a lot of “flickering” out of the detections, as bodyparts from frame to frame may be just above or below that threshold. Respectively most of the detect_activity algorithm is a series of filtering steps to alternatingly smooth the predictions and sharpening the edges of predicted behavior.
- Parameters:
speed_dframe (pandas.DataFrame) – speed of body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
min_length (int) – minimum length that True segments need to have to not get filtered out.
center_name (str) – Body part to center coordinates on. “Center” by default.
animal_id (str) – ID of the current animal.
- Returns:
True if the animal is standing still and is active, False otherwise stationary_passive (np.array): True if the animal is standing still and is passive, False otherwise mobile (np.array): True if the animal is not standing still, False otherwise
- Return type:
stationary_active (np.array)
- deepof.annotation_utils.sniff_around(speed_dframe: DataFrame, likelihood_dframe: DataFrame, tol_speed: float, tol_likelihood: float, center_name: str = 'Center', animal_id: str = '')
Return true when the mouse is sniffing around using simple rules.
- Parameters:
speed_dframe (pandas.DataFrame) – speed of body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
center_name (str) – Body part to center coordinates on. “Center” by default.
animal_id (str) – ID of the current animal.
- Returns:
True if the animal is standing still and sniffing around, False otherwise
- Return type:
lookaround (np.array)
- deepof.annotation_utils.following_path(distance_dframe: DataFrame, position_dframe: DataFrame, speed_dframe: DataFrame, follower: str, followed: str, frames: int = 20, tol: float = 0, tol_speed: float = 0) array
Return True if ‘follower’ is closer than tol to the path that followed has walked over the last specified number of frames.
For multi animal videos only.
- Args:
distance_dframe (pandas.DataFrame): distances between bodyparts; generated by the preprocess module position_dframe (pandas.DataFrame): position of bodyparts; generated by the preprocess module speed_dframe (pandas.DataFrame): speed of body parts over time follower (str) identifier for the animal who’s following followed (str) identifier for the animal who’s followed frames (int) frames in which to track whether the process consistently occurs, tol (float) Maximum distance for which True is returned tol_speed (float): Minimum speed for the following mouse
- Returns:
follow (np.array): boolean sequence, True if conditions are fulfilled, False otherwise
- deepof.annotation_utils.max_behaviour(behaviour_dframe: DataFrame, window_size: int = 10, stepped: bool = False) array
Return the most frequent behaviour in a window of window_size frames.
- Parameters:
behaviour_dframe (pd.DataFrame) – boolean matrix containing occurrence of tagged behaviours per frame in the video
window_size (int) – size of the window to use when computing the maximum behaviour per time slot
stepped (bool) – sliding windows don’t overlap if True. False by default
- Returns:
string array with the most common behaviour per instance of the sliding window
- Return type:
max_array (np.array)
- deepof.annotation_utils.frame_corners(w, h, corners: dict = {})
Return a dictionary with the corner positions of the video frame.
- Parameters:
w (int) – width of the frame in pixels
h (int) – height of the frame in pixels
corners (dict) – dictionary containing corners to overwrite
- Returns:
dictionary with overwriten parameters. Those not specified in the input retain their default values
- Return type:
defaults (dict)
- deepof.annotation_utils.supervised_tagging(coord_object: deepof_coordinates, raw_coords: deepof_table_dict, coords: deepof_table_dict, dists: deepof_table_dict, angles: deepof_table_dict, speeds: deepof_table_dict, full_features: dict, key: str, immobility_estimator: str | None = None, center: str = 'Center', params: dict = {}, run_numba: bool = False) DataFrame
Output a dataframe with the registered motives per frame.
If specified, produces a labeled video displaying the information in real time
- Parameters:
coord_object (deepof.data.coordinates) – coordinates object containing the project information
raw_coords (deepof.data.table_dict) – table_dict with raw coordinates
coords (deepof.data.table_dict) – table_dict with already processed (centered and aligned) coordinates
dists (deepof.data.table_dict) – table_dict with already processed distances
angles (deepof.data.table_dict) – table_dict with already processed angles
speeds (deepof.data.table_dict) – table_dict with already processed speeds
full_features (dict) – A dictionary of aligned kinematics, where the keys are the names of the experimental conditions. The values are the aligned kinematics for each condition.
key (str) – key to the experiment to tag and current set of objects (videos, tables, distances etc.)
immobility_estimator (str) – classifier to determine if a mouse is immobile or not.
center (str) – Body part to center coordinates on. “Center” by default.
params (dict) – dictionary to overwrite the default values of the parameters of the functions that the rule-based pose estimation utilizes. See documentation for details.
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)
- Returns:
table with traits as columns and frames as rows. Each value is a boolean indicating trait detection at a given time
- Return type:
tag_df (pandas.DataFrame)
- deepof.annotation_utils.calculate_close_range(df: DataFrame, mouse_id: str, bodypart: str, threshold: float)
Detects for a given set of mouse coordinates if the selected bodypart of the selected mouse is close to any bodypart of any other mouse for each frame.
- Parameters:
df (pd.DataFrame) – Dataframe containing coordinates of multiple mice
mouse_id (str) – Id of the target mouse
bodypart (str) – Bodypart of the target mouse that should be used for distance calculation
threshold (float) – Maximum distance that triggers “closeness”
- Returns:
Boolean numpy array set to True for each frame in which the lected bodypart of the selected mosue was closer than threshold to any other mouse, False otherwise.
- Return type:
proximity_mask (np.array)
deepof.data module
Data structures for preprocessing and wrangling of motion tracking output data. This is the main module handled by the user.
There are three main data structures to pay attention to:
- Project, which serves as a configuration hub for the whole pipeline
- Coordinates, which acts as an intermediary between project configuration and data, and contains
a plethora of processing methods to apply, and
- TableDict, which is the main data structure to store the data, having experiment IDs as keys
and processed time-series as values in a dictionary-like object.
For a detailed tutorial on how to use this module, see the advanced tutorials in the main section.
- deepof.data.is_display_available()
- deepof.data.load_project(project_path: str, animal_ids: List | None = None, arena: str = 'polygonal-autodetect', bodypart_graph: str | dict = 'deepof_14', iterative_imputation: str = 'partial', exclude_bodyparts: List = ('',), exp_conditions: dict | None = None, remove_outliers: bool = True, interpolation_limit: int = 5, interpolation_std: int = 3, likelihood_tol: float = 0.75, model: str = 'mouse_topview', project_name: str = 'deepof_project', video_path: str | None = None, table_path: str | None = None, rename_bodyparts: list | None = None, sam_checkpoint_path: str | None = None, smooth_alpha: float = 1, table_format: str = 'autodetect', video_format: str = '.mp4', video_scale: int = 1, number_of_rois=0, fast_implementations_threshold: int = 50000) deepof_coordinates
Load a pre-saved pickled Coordinates object. Will update Coordinate objects from older versions of deepof (down to 0.7) to work with this version. Very old projects will be recreated during loading with the current version of Deepof. For this purpose input arguments can be set just as in a recular project definition.
- Parameters:
animal_ids (list) – list of animal ids.
arena (str) – arena type. Can be one of “circular-autodetect”, “circular-manual”, “polygonal-autodetect”, or “polygonal-manual”.
bodypart_graph (str) – body part scheme to use for the analysis. Defaults to None, in which case the program will attempt to select it automatically based on the available body parts.
iterative_imputation (str) – whether to use iterative imputation for occluded body parts, options are “full” and “partial”. if set to None, no imputation takes place.
exclude_bodyparts (list) – list of bodyparts to exclude from analysis.
exp_conditions (dict) – dictionary with experiment IDs as keys and experimental conditions as values.
remove_outliers (bool) – whether outliers should be removed during project creation.
interpolation_limit (int) – maximum number of missing frames to interpolate.
interpolation_std (int) – maximum number of standard deviations to interpolate.
likelihood_tol (float) – likelihood threshold for outlier detection.
model (str) – model to use for pose estimation. Defaults to ‘mouse_topview’ (as described in the documentation).
project_name (str) – name of the current project.
project_path (str) – path to the folder containing the motion tracking output data.
video_path (str) – path where to find the videos to use. If not specified, deepof, assumes they are in your project path.
table_path (str) – path where to find the tracks to use. If not specified, deepof, assumes they are in your project path.
rename_bodyparts (list) – list of names to use for the body parts in the provided tracking files. The order should match that of the columns in your DLC tables or the node dimensions on your (S)LEAP .npy files.
sam_checkpoint_path (str) – path to the checkpoint file for the SAM model. If not specified, the model will be saved in the installation folder.
smooth_alpha (float) – smoothing intensity. The higher the value, the more smoothing.
table_format (str) – format of the table. Defaults to ‘autodetect’, but can be set to “csv” or “h5” for DLC output, and “npy”, “slp” or “analysis.h5” for (S)LEAP.
video_format (str) – video format. Defaults to ‘.mp4’.
video_scale (int) – diameter of the arena in mm (if the arena is round) or length of the first specified arena side (if the arena is polygonal).
number_of_rois (int) – number of behavior rois to be drawn during project creation, default = 0,
fast_implementations_threshold (int) – If the total number of frames in the project is larger than this, numba implementations of all functions with a numba option will be used.
- Returns:
Pre-run coordinates object.
- Return type:
coordinates (deepof_coordinates)
- class deepof.data.Project(animal_ids: List | None = None, arena: str = 'polygonal-autodetect', bodypart_graph: str | dict = 'deepof_14', iterative_imputation: str = 'partial', exclude_bodyparts: List = ('',), exp_conditions: str | dict | None = None, remove_outliers: bool = True, interpolation_limit: int = 5, interpolation_std: int = 3, likelihood_tol: float = 0.75, model: str = 'mouse_topview', project_name: str = 'deepof_project', project_path: str = '.', video_path: str | None = None, table_path: str | None = None, rename_bodyparts: list | None = None, sam_checkpoint_path: str | None = None, smooth_alpha: float = 1, table_format: str = 'autodetect', video_format: str = '.mp4', video_scale: str | None = None, number_of_rois: int = 0, fast_implementations_threshold: int = 50000)
Bases:
objectClass for loading and preprocessing motion tracking data of individual and multiple animals.
All main computations are handled from here.
- __init__(animal_ids: List | None = None, arena: str = 'polygonal-autodetect', bodypart_graph: str | dict = 'deepof_14', iterative_imputation: str = 'partial', exclude_bodyparts: List = ('',), exp_conditions: str | dict | None = None, remove_outliers: bool = True, interpolation_limit: int = 5, interpolation_std: int = 3, likelihood_tol: float = 0.75, model: str = 'mouse_topview', project_name: str = 'deepof_project', project_path: str = '.', video_path: str | None = None, table_path: str | None = None, rename_bodyparts: list | None = None, sam_checkpoint_path: str | None = None, smooth_alpha: float = 1, table_format: str = 'autodetect', video_format: str = '.mp4', video_scale: str | None = None, number_of_rois: int = 0, fast_implementations_threshold: int = 50000)
Initialize a Project object.
- Parameters:
animal_ids (list) – list of animal ids.
arena (str) – arena type. Can be one of “circular-autodetect”, “circular-manual”, “polygonal-autodetect”, or “polygonal-manual”.
bodypart_graph (str) – body part scheme to use for the analysis. Defaults to None, in which case the program will attempt to select it automatically based on the available body parts.
iterative_imputation (str) – whether to use iterative imputation for occluded body parts, options are “full” and “partial”. if set to None, no imputation takes place.
exclude_bodyparts (list) – list of bodyparts to exclude from analysis.
exp_conditions (dict) – dictionary with experiment IDs as keys and experimental conditions as values.
remove_outliers (bool) – whether outliers should be removed during project creation.
interpolation_limit (int) – maximum number of missing frames to interpolate.
interpolation_std (int) – maximum number of standard deviations to interpolate.
likelihood_tol (float) – likelihood threshold for outlier detection.
model (str) – model to use for pose estimation. Defaults to ‘mouse_topview’ (as described in the documentation).
project_name (str) – name of the current project.
project_path (str) – path to the folder containing the motion tracking output data.
video_path (str) – path where to find the videos to use. If not specified, deepof, assumes they are in your project path.
table_path (str) – path where to find the tracks to use. If not specified, deepof, assumes they are in your project path.
rename_bodyparts (list) – list of names to use for the body parts in the provided tracking files. The order should match that of the columns in your DLC tables or the node dimensions on your (S)LEAP .npy files.
sam_checkpoint_path (str) – path to the checkpoint file for the SAM model. If not specified, the model will be saved in the installation folder.
smooth_alpha (float) – smoothing intensity. The higher the value, the more smoothing.
table_format (str) – format of the table. Defaults to ‘autodetect’, but can be set to “csv” or “h5” for DLC output, and “npy”, “slp” or “analysis.h5” for (S)LEAP.
video_format (str) – video format. Defaults to ‘.mp4’.
video_scale (int) – diameter of the arena in mm (if the arena is round) or length of the first specified arena side (if the arena is polygonal).
number_of_rois (int) – number of behavior rois to be drawn during project creation, default = 0,
fast_implementations_threshold (int) – If the total number of frames in the project is larger than this, numba implementations of all functions with a numba option will be used.
- set_up_project_directory(debug=False)
Create a project directory where to save all produced results.
- load_exp_conditions(filepath)
Load experimental conditions from a wide-format csv table.
- Parameters:
filepath (str) – Path to the file containing the experimental conditions.
- get_arena(tables: dict, debug: str = False, test: bool = False) array
Return the arena as recognised from the videos.
- Parameters:
tables (dict) – dictionary containing coordinate tables
debug (str) – if True, saves intermediate results to disk
test (bool) – if True, runs the function in test mode
- Returns:
arena parameters, as recognised from the videos. The shape depends on the arena type
- Return type:
arena (np.ndarray)
- preprocess_tables() Tuple[deepof_table_dict, deepof_table_dict]
Loads and preprocesses tracking data through a series of modular steps, then saves the results and returns table dictionaries.
- scale_tables(tab_dict: deepof_table_dict) deepof_table_dict
Scales all tables to mm using scaling information from arena detection.
- Parameters:
tab_dict (table_dict) – Table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
- Returns:
Scaled table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
- Return type:
tab_dict (table_dict)
- get_distances(tab_dict: deepof_table_dict) dict
Compute the distances between all selected body parts over time for a table dictionary.
- Parameters:
tab_dict (table_dict) – Table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
- Returns:
Table dictionary of pandas DataFrames containing the distances between all bodyparts.
- Return type:
distance_dict
- get_distances_tab(tab: DataFrame) dict
Compute the distances between all selected body parts over time for a single table.
- Parameters:
tab (pd.DataFrame) – Pandas DataFrame containing the trajectories of all bodyparts.
- Returns:
Pandas DataFrame containing the distances between all bodyparts.
- Return type:
distance_tab
- get_angles(tab_dict: deepof_table_dict) dict
Compute all the angles between adjacent bodypart trios per video and per frame in all datasets in the given table dictionary.
- Parameters:
tab_dict (table_dict) – Table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
- Returns:
Table dictionary of pandas DataFrames containing the angles between all bodyparts.
- Return type:
angle_dict
- get_areas(tab_dict: deepof_table_dict) dict
Compute all relevant areas (head, torso, back) per video and per frame in the data.
- Parameters:
tab_dict (table_dict) – Table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
- Returns:
Table dictionary of pandas DataFrames containing the areas (head, torso, back) between sets of bodyparts.
- Return type:
all_areas_dict
- create(verbose: bool = True, force: bool = False, debug: bool = True, test: bool = False, _to_extend: deepof_coordinates | None = None) deepof_coordinates
Generate a deepof.Coordinates dataset using all the options specified during initialization.
- Parameters:
verbose (bool) – If True, prints progress. Defaults to True.
force (bool) – If True, overwrites existing project. Defaults to False.
debug (bool) – If True, saves arena detection images to disk. Defaults to False.
test (bool) – If True, creates the project in test mode (which, for example, bypasses any manual input). Defaults to False.
_to_extend (coordinates) – Coordinates object to extend with the current dataset. For internal usage only.
- Returns:
Deepof.Coordinates object containing the trajectories of all bodyparts.
- Return type:
coordinates (coordinates)
- property distances
Returns distances table_dict
- property ego
String, name of a body part. If True, computes only the distances between the specified body part and the rest.
- property angles
Returns angles table_dict
- extend(project_to_extend: deepof_coordinates, video_path: str | None = None, table_path: str | None = None, verbose: bool = True, debug: bool = True, test: bool = False) deepof_coordinates
Generate a deepof.Coordinates dataset using all the options specified during initialization.
- Parameters:
project_to_extend (coordinates) – Coordinates object to extend with the current dataset.
video_path (str) – Path to the videos. If not specified, defaults to the project path.
table_path (str) – Path to the tracks. If not specified, defaults to the project path.
verbose (bool) – Prints progress if True. Defaults to True.
debug (bool) – Saves arena detection images to disk if True. Defaults to False.
test (bool) – Runs the project in test mode if True. Defaults to False.
- Returns:
Deepof.Coordinates object containing the trajectories of all body parts.
- Return type:
coordinates (coordinates)
- class deepof.data.Coordinates(project_path: str, project_name: str, arena: str, arena_dims: array, bodypart_graph: str, path: str, quality: dict, scales: dict, frame_rate: float, arena_params: dict, roi_dicts: dict, tables: dict, source_table_path: str, table_paths: List, trained_model_path: str, videos: List, video_path: str, video_resolution: dict, angles: dict | None = None, animal_ids: List = ('',), areas: dict | None = None, distances: dict | None = None, connectivity: Graph | None = None, excluded_bodyparts: list | None = None, exp_conditions: dict | None = None, number_of_rois: int = 0, run_numba: bool = False, very_large_project: bool = False, version: str | None = None)
Bases:
objectClass for storing the results of a ran project. Methods are mostly setters and getters in charge of tidying up the generated tables.
- __init__(project_path: str, project_name: str, arena: str, arena_dims: array, bodypart_graph: str, path: str, quality: dict, scales: dict, frame_rate: float, arena_params: dict, roi_dicts: dict, tables: dict, source_table_path: str, table_paths: List, trained_model_path: str, videos: List, video_path: str, video_resolution: dict, angles: dict | None = None, animal_ids: List = ('',), areas: dict | None = None, distances: dict | None = None, connectivity: Graph | None = None, excluded_bodyparts: list | None = None, exp_conditions: dict | None = None, number_of_rois: int = 0, run_numba: bool = False, very_large_project: bool = False, version: str | None = None)
Class for storing the results of a ran project. Methods are mostly setters and getters in charge of tidying up the generated tables.
- Parameters:
project_name (str) – name of the current project.
project_path (str) – path to the folder containing the motion tracking output data.
arena (str) – Type of arena used for the experiment. See deepof.data.Project for more information.
arena_dims (np.array) – Dimensions of the arena. See deepof.data.Project for more information.
bodypart_graph (nx.Graph) – Graph containing the body part connectivity. See deepof.data.Project for more information.
path (str) – Path to the folder containing the results of the experiment.
quality (dict) – Dictionary containing the quality of the experiment. See deepof.data.Project for more information.
scales (dict) – Scales used for the experiment. See deepof.data.Project for more information.
frame_rate (float) – frame rate of the processed videos.
arena_params (dict) – Dictionary containing the parameters of the arena. See deepof.data.Project for more information.
roi_dicts (dict) – Dictionary containing all rois for all videos as determined byt he user.
tables (dict) – Dictionary containing the tables of the experiment. See deepof.data.Project for more information.
table_paths (List) – List containing the paths to the tables of the experiment. See deepof.data.Project for more information.f
trained_model_path (str) – Path to the trained models used for the supervised pipeline. For internal use only.
videos (List) – List containing the videos used for the experiment. See deepof.data.Project for more information.
video_resolution (dict) – Dictionary containing the automatically detected resolution of the videos used for the experiment.
angles (dict) – Dictionary containing the angles of the experiment. See deepof.data.Project for more information.
animal_ids (List) – List containing the animal IDs of the experiment. See deepof.data.Project for more information.
areas (dict) – dictionary with areas to compute. By default, it includes head, torso, and back.
distances (dict) – Dictionary containing the distances of the experiment. See deepof.data.Project for more information.
excluded_bodyparts (list) – list of bodyparts to exclude from analysis.
exp_conditions (dict) – Dictionary containing the experimental conditions of the experiment. See deepof.data.Project for more information.
number_of_rois (int) – number of behavior rois t be drawn during project creation, default = 0,
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)
very_large_project (bool) – Decides if memory efficient data loading and saving should be used
version (str) – version of deepof this object was created with
- get_table_keys()
get the keys to all experiments in this coordinates object
- get_coords(center: str = False, polar: bool = False, speed: int = 0, align: str = False, align_group: bool = False, align_inplace: bool = True, to_video: bool = False, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, in_roi_criterion: str = 'Center', file_name: str = 'coords', return_path: bool = False) deepof_table_dict
Return a table_dict object with the coordinates of each animal as values.
- Parameters:
center (str) – Name of the body part to which the positions will be centered. If false, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch
polar (bool)
speed (int) – States the derivative of the positions to report. Speed is returned if 1, acceleration if 2, jerk if 3, etc.
align (str) – Selects the body part to which later processes will align the frames with (see preprocess in table_dict documentation).
align_inplace (bool) – Only valid if align is set. Aligns the vector that goes from the origin to the selected body part with the y-axis, for all timepoints (default).
to_video (bool) – Undoes the scaling to mm back to the pixel scaling from the original video
selected_id (str) – Selects a single animal on multi animal settings. Defaults to None (all animals are processed).
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
in_roi_criterion (str) – Bodypart of a mouse that has to be in the ROI to count the mouse as “inside” the ROI.
file_name (str) – Name of the file for saving
return_path (bool) – if True, Return only the path to the saving location of the processed table, if false, return the full table.
- Returns:
A table_dict object containing the coordinates of each animal as values.
- Return type:
table_dict
- get_coords_at_key(key: str, scale: array, quality: deepof_table_dict | None = None, center: str = False, polar: bool = False, speed: int = 0, align: str = False, align_group: bool = False, align_inplace: bool = True, to_video: bool = False, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, in_roi_criterion: str = 'Center') DataFrame
Return a pandas dataFrame with the coordinates for the selected key as values.
- Parameters:
key (str) – key for requested distance
scale (np.array) – scale of the current arena.
quality – (table_dict): Quality information for current data Frame
center (str) – Name of the body part to which the positions will be centered. If false, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch
polar (bool)
speed (int) – States the derivative of the positions to report. Speed is returned if 1, acceleration if 2, jerk if 3, etc.
align (str) – Selects the body part to which later processes will align the frames with (see preprocess in table_dict documentation).
align_inplace (bool) – Only valid if align is set. Aligns the vector that goes from the origin to the selected body part with the y-axis, for all timepoints (default).
to_video (bool) – Undoes the scaling to mm back to the pixel scaling from the original video
selected_id (str) – Selects a single animal on multi animal settings. Defaults to None (all animals are processed).
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
in_roi_criterion (str) – Bodypart of a mouse that has to be in the ROI to count the mouse as “inside” the ROI.
- Returns:
A data frame containing the coordinates for the selected key as values.
- Return type:
tab (pd.DataFrame)
- get_distances(speed: int = 0, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, filter_on_graph: bool = True, file_name: str = 'got_distances', return_path: bool = False) deepof_table_dict
Return a table_dict object with the distances between body parts animal as values.
- Parameters:
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
filter_on_graph (bool) – If True, only distances between connected nodes in the DeepOF graph representations are kept. Otherwise, all distances between bodyparts are returned.
file_name (str) – Name of the file for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.
- Returns:
A table_dict object with the distances between body parts animal as values.
- Return type:
table_dict
- get_distances_at_key(key: str, quality: deepof_table_dict | None = None, speed: int = 0, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, filter_on_graph: bool = True) DataFrame
Return a pd.DataFrame with the distances between body parts of one animal as values.
- Parameters:
key (str) – key for requested distance
quality – (table_dict): Quality information for current data Frame
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
filter_on_graph (bool) – If True, only distances between connected nodes in the DeepOF graph representations are kept. Otherwise, all distances between bodyparts are returned.
- Returns:
A pd.DataFrame with the distances between body parts of one animal as values.
- Return type:
tab (pd.DataFrame)
- get_angles(degrees: bool = False, speed: int = 0, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, file_name: str = 'got_angles', return_path: bool = False) deepof_table_dict
Return a table_dict object with the angles between body parts animal as values.
- Parameters:
degrees (bool) – If True, angles are converted to degrees; otherwise they remain in radians (default).
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
file_name (str) – Name of the file for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.
- Returns:
A table_dict object with the angles between body parts animal as values.
- Return type:
table_dict
- get_angles_at_key(key: str, quality: deepof_table_dict | None = None, degrees: bool = False, speed: int = 0, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None) DataFrame
Return a Dataframe with the angles between body parts for one animal as values.
- Parameters:
key (str) – key for requested distance
quality – (table_dict): Quality information for current data Frame
degrees (bool) – If True, angles are converted to degrees; otherwise they remain in radians (default).
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
- Returns:
A pd.DataFrame with the angles between body parts of one animal as values.
- Return type:
tab (pd.DataFrame)
- get_areas(speed: int = 0, selected_id: str = 'all', roi_number: int | None = None, animals_in_roi: str | None = None, file_name: str = 'got_areas', return_path: bool = False) deepof_table_dict
Return a table_dict object with all relevant areas (head, torso, back, full). Unless specified otherwise, the areas are computed for all animals.
- Parameters:
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select. “all” (default) computes the areas for all animals. Declared in self._animal_ids.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
file_name (str) – Name of the file for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.
- Returns:
A table_dict object with the areas of the body parts animal as values.
- Return type:
table_dict
- get_areas_at_key(key: str, quality: deepof_table_dict | None = None, speed: int = 0, selected_id: str = 'all', roi_number: int | None = None, animals_in_roi: str | None = None) deepof_table_dict
Return a pd.DataFrame with all relevant areas (head, torso, back, full). Unless specified otherwise, the areas are computed for all animals.
- Parameters:
key (str) – key for requested distance
quality – (table_dict): Quality information for current data Frame
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select. “all” (default) computes the areas for all animals. Declared in self._animal_ids.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
- Returns:
A pd.DataFrame object with the areas of the body parts animal as values.
- Return type:
tab (pd.DataFrame)
- get_videos(full_paths: bool = False, play: bool = False)
Returns the videos associated with the dataset as a dictionary.
- get_start_times()
Returns the start time for each table in a dictionary
- get_end_times()
Returns the end time for each table in a dictionary
- get_table_lengths()
Returns the length for each table in a dictionary
- property get_exp_conditions
Return the stored dictionary with experimental conditions per subject.
- get_condition_values(exp_cond)
- load_exp_conditions(filepath)
Load experimental conditions from a wide-format csv table.
- Parameters:
filepath (str) – Path to the file containing the experimental conditions.
- get_quality()
Retrieve a dictionary with the tagging quality per video, as reported by DLC or SLEAP.
- property get_arenas
Retrieve all available information associated with the arena.
- edit_arenas(video_keys: list | None = None, arena_type: str | None = None, verbose: bool = True)
Tag the arena in the videos.
- Parameters:
video_keys (list) – A list of keys for videos to reannotate. If None, all videos are loaded.
arena_type (str) – The type of arena to use. Must be one of “polygonal-manual”, “circular-manual”, or “circular-autodetect”. If None (default), the arena type specified when creating the project is used.
verbose (bool) – Whether to print the progress of the annotation.
- save(file=None, filename: str | None = None, timestamp: bool = True)
Save the current state of the Coordinates object to a pickled file.
- Parameters:
file (obj) – optional Objet to save, if None, project gets saved
filename (str) – Name of the pickled file to store. If no name is provided, a default is used.
timestamp (bool) – Whether to append a time stamp at the end of the output file name.
- get_graph_dataset(**kwargs)
- get_supervised_parameters() dict
Return the most frequent behaviour in a window of window_size frames.
- Parameters:
hparams (dict) – dictionary containing hyperparameters to overwrite
- Returns:
dictionary with overwritten parameters. Those not specified in the input retain their default values
- Return type:
defaults (dict)
- reset_supervised_parameters() dict
Return the most frequent behaviour in a window of window_size frames.
- Parameters:
hparams (dict) – dictionary containing hyperparameters to overwrite
- Returns:
dictionary with overwritten parameters. Those not specified in the input retain their default values
- Return type:
defaults (dict)
- set_supervised_parameters(hparams: dict = {})
Return the most frequent behaviour in a window of window_size frames.
- Parameters:
hparams (dict) – dictionary containing hyperparameters to overwrite
- Returns:
dictionary with overwritten parameters. Those not specified in the input retain their default values
- Return type:
defaults (dict)
- supervised_annotation(**kwargs)
- deep_unsupervised_embedding(preprocessed_object: Tuple[ndarray, ndarray, ndarray, ndarray], adjacency_matrix: ndarray | None = None, bin_size=None, bin_index=None, precomputed_bins=None, samples_max=None, embedding_model: str = 'VaDE', encoder_type: str = 'recurrent', batch_size: int = 64, latent_dim: int = 4, epochs: int = 150, log_history: bool = True, log_hparams: bool = False, n_components: int = 10, kmeans_loss: float = 0.0, temperature: float = 0.1, contrastive_similarity_function: str = 'cosine', contrastive_loss_function: str = 'nce', beta: float = 0.1, tau: float = 0.1, output_path: str = '', pretrained: str = False, save_checkpoints: bool = False, save_weights: bool = True, input_type: str = False, run: int = 0, kl_annealing_mode: str = 'linear', kl_warmup: int = 15, reg_cat_clusters: float = 0.0, recluster: bool = False, interaction_regularization: float = 0.0, **kwargs) Tuple
Annotates coordinates using a deep unsupervised autoencoder.
- Parameters:
preprocessed_object (tuple) – Tuple containing a preprocessed object (X_train, y_train, X_test, y_test).
adjacency_matrix (np.ndarray) – adjacency matrix of the connectivity graph to use.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
embedding_model (str) – Name of the embedding model to use. Must be one of VQVAE (default), VaDE, or contrastive.
encoder_type (str) – Encoder architecture to use. Must be one of “recurrent”, “TCN”, and “transformer”.
batch_size (int) – Batch size for training.
latent_dim (int) – Dimention size of the latent space.
epochs (int) – Maximum number of epochs to train the model. Actual training might be shorter, as the model will stop training when validation loss stops decreasing.
log_history (bool) – Whether to log the history of the model to TensorBoard.
log_hparams (bool) – Whether to log the hyperparameters of the model to TensorBoard.
n_components (int) – Number of latent clusters for the embedding model to use.
kmeans_loss (float) – Weight of the gram loss, which adds a regularization term to VaDE and VQVAE models which penalizes the correlation between the dimensions in the latent space.
temperature (float) – temperature parameter for the contrastive loss functions. Higher values put harsher penalties on negative pair similarity.
contrastive_similarity_function (str) – similarity function between positive and negative pairs. Must be one of ‘cosine’ (default), ‘euclidean’, ‘dot’, and ‘edit’.
contrastive_loss_function (str) – contrastive loss function. Must be one of ‘nce’ (default), ‘dcl’, ‘fc’, and ‘hard_dcl’. See specific documentation for details.
beta (float) – Beta (concentration) parameter for the hard_dcl contrastive loss. Higher values lead to ‘harder’ negative samples.
tau (float) – Tau parameter for the dcl and hard_dcl contrastive losses, indicating positive class probability.
output_path (str) – Path to save the trained model and all log files.
pretrained (str) – Whether to load a pretrained model. If False, model is trained from scratch. If not, must be the path to a saved model.
save_checkpoints (bool) – Whether to save checkpoints of the model during training. Defaults to False.
save_weights (bool) – Whether to save the weights of the model during training. Defaults to True.
input_type (str) – Type of the preprocessed_object passed as the first parameter. See deepof.data.TableDict for more details.
run (int) – Run number for the model. Used to save the model and log files. Optional.
kl_annealing_mode (str) – Mode of the KL annealing. Must be one of “linear”, or “sigmoid”.
kl_warmup (int) – Number of epochs to warm up the KL annealing.
reg_cat_clusters (bool) – whether to penalize uneven cluster membership in the latent space, by minimizing the KL divergence between cluster membership and a uniform categorical distribution.
recluster (bool) – whether to recluster after training using a Gaussian Mixture Model. Only valid for VaDE.
interaction_regularization (float) – weight of the interaction regularization term for all encoders.
**kwargs – Additional keyword arguments to pass to the model.
- Returns:
Tuple containing all trained models. See specific model documentation under deepof.models for details.
- Return type:
Tuple
- class deepof.data.TableDict(tabs: Dict, typ: str, table_path: str | None = None, arena: str | None = None, arena_dims: array | None = None, animal_ids: List = ('',), center: str | None = None, connectivity: Graph | None = None, polar: bool | None = None, exp_conditions: dict | None = None, shapes: Dict = {})
Bases:
dictMain class for storing a single dataset as a dictionary with individuals as keys and pandas.DataFrames as values.
Includes methods for generating training and testing datasets for the supervised and unsupervised models.
- __init__(tabs: Dict, typ: str, table_path: str | None = None, arena: str | None = None, arena_dims: array | None = None, animal_ids: List = ('',), center: str | None = None, connectivity: Graph | None = None, polar: bool | None = None, exp_conditions: dict | None = None, shapes: Dict = {})
Store single datasets as dictionaries with individuals as keys and pandas.DataFrames as values.
Includes methods for generating training and testing datasets for the autoencoders.
- Parameters:
tabs (Dict) – Dictionary of pandas.DataFrames with individual experiments as keys.
typ (str) – Type of the dataset. Examples are “coords”, “dists”, and “angles”. For logging purposes only.
table_path (str) – Path to the root directory that is going to be used to save table iterations.
arena (str) – Type of the arena. Must be one of “circular-autodetect”, “circular-manual”, or “polygon-manual”. Handled internally.
arena_dims (np.array) – Dimensions of the arena in mm.
animal_ids (list) – list of animal ids.
center (str) – Type of the center. Handled internally.
connectivity (nx.Graph) – Bodypart graph of a mouse.
polar (bool) – Whether the dataset is in polar coordinates. Handled internally.
exp_conditions (dict) – dictionary with experiment IDs as keys and experimental conditions as values.
shapes (Dict) – Dictionary containing the shapes of all stored tables
- filter_videos(keys: list) deepof_table_dict
Return a subset of the original table_dict object, containing only the specified keys.
Useful, for example, to select data coming from videos of a specified condition.
- Parameters:
keys (list) – List of keys to keep.
- Returns:
Subset of the original table_dict object, containing only the specified keys.
- Return type:
- filter_condition(exp_filters: dict) deepof_table_dict
Return a subset of the original table_dict object, containing only videos belonging to the specified experimental condition.
- Parameters:
exp_filters (dict) – experimental conditions and values to filter on.
- Returns:
Subset of the original table_dict object, containing only the specified keys.
- Return type:
- filter_id(selected_id: str | None = None) deepof_table_dict
Filter a TableDict object to keep only those columns related to the selected id.
Leave labels untouched if present.
- Parameters:
selected_id (str) – select a single animal on multi animal settings. Defaults to None (all animals are processed).
- Returns:
Filtered TableDict object, keeping only the selected animal.
- Return type:
table_dict
- new_dict_same_header(tabs: dict | None = None, only_keys: bool = False)
Creates a new table dict based on a given dictionary and the existing header information.
- Parameters:
tabs (dict) – Dictionary of table entries
only_keys (bool) – Copy dictionary keys and create empty dictionary with same keys
- Returns:
New TableDict object, based on given tabs and existing header info.
- Return type:
table_dict
- random_projection(n_components: int = 2, kernel: str = 'linear') Tuple[Any, Any]
Return a training set generated from the 2D original data (time x features) and a random projection to a n_components space.
The sample parameter allows the user to randomly pick a subset of the data for performance or visualization reasons.
- Parameters:
n_components (int) – Number of components to project to. Default is 2.
kernel (str) – Kernel to be used for projections. Defaults to linear.
- Returns:
Tuple containing projected data and projection type.
- Return type:
tuple
- pca(n_components: int = 2, kernel: str = 'linear') Tuple[Any, Any]
Return a training set generated from the 2D original data (time x features) and a PCA projection to a n_components space.
The sample parameter allows the user to randomly pick a subset of the data for performance or visualization reasons.
- Parameters:
n_components (int) – Number of components to project to. Default is 2.
kernel (str) – Kernel to be used for projections. Defaults to linear.
- Returns:
Tuple containing projected data and projection type.
- Return type:
tuple
- umap(n_components: int = 2) Tuple[Any, Any]
Return a training set generated from the 2D original data (time x features) and a PCA projection to a n_components space.
The sample parameter allows the user to randomly pick a subset of the data for performance or visualization reasons.
- Parameters:
n_components (int) – Number of components to project to. Default is 2.
- Returns:
Tuple containing projected data and projection type.
- Return type:
tuple
- merge(*args, ignore_index=False, file_name='merged', save_as_paths=False)
Take a number of table_dict objects and merges them to the current one.
Returns a table_dict object of type ‘merged’. Only annotations of the first table_dict object are kept.
- Parameters:
*args (table_dict) – table_dict objects to be merged.
ignore_index (bool) – ignore index when merging. Defaults to False.
file_name (str) – Name that is used for saving the merged table
save_as_paths (bool) – If True, Saves merged datasets as paths to file locations instead of keeping tables in RAM
- Returns:
Merged table_dict object.
- Return type:
table_dict
- get_training_set(current_table_dict: deepof_table_dict, test_videos: int | list = 0) tuple
Generate training and test sets as table_dicts for model training.
Intended for internal usage only.
- Parameters:
current_table_dict (table_dict) – table_dict object containing the data to be used for training.
test_videos (Union[int, list]) – Number of videos to be used for testing or keys of test videos. Defaults to 0.
- Returns:
X_train (table_dict): only training data ELSE: tuple: Tuple containing training data, test data (as table_dicts), and test keys (if any).
- Return type:
IF there are no test videos
- preprocess(coordinates, window_size: int = 25, window_step: int = 1, bin_size=None, bin_index=None, precomputed_bins=None, samples_max: int = 227272, scale: str = 'standard', pretrained_scaler=None, test_videos: int = 0, interpolate_normalized: int = 10, filter_low_variance: bool = False, file_name: str = 'preprocessed', save_as_paths: bool | None = None, shuffle: bool = False, quality_to_load=None, dist_standardize: str = 'groupwise', speed_standardize: str = 'groupwise', log_distances: bool = True) tuple
Preprocess pose tables for model training.
Pipeline: 1. Filter by time bins, drop all-NaN tables 2. Optionally replace speeds with quality scores 3. Collect samples to fit global scalers (size-normalized but not standardized) 4. Apply full scaling (size + statistical) and save 5. Extract sliding windows for training
- Parameters:
quality_to_load – Optional table_dict containing quality scores to replace speed values. Useful when speed reliability varies and you want to weight by tracking quality.
- sample_windows_from_data(time_bin_info: Dict[str, ndarray] | None = None, N_windows_tab: int = 10000, return_edges: bool = False, no_nans: bool = False) Tuple[ndarray, Dict] | Tuple[ndarray, ndarray, Dict]
Samples a set of windows from data entries, enhancing readability and reducing complexity.
- Parameters:
time_bin_info (dict, optional) – Pre-defined indices to sample for each key. If provided, sampling logic is bypassed. Defaults to None.
N_windows_tab (int) – Max number of windows to sample from each recording if time_bin_info is not given.
return_edges (bool) – If True, returns a second dataset for edges.
no_nans (bool) – If True and time_bin_info is not given, only samples from rows without NaNs. Note: This may result in non-contiguous original indices.
- Returns:
The concatenated main dataset (X_data). - np.array: The concatenated edge dataset (a_data), if return_edges is True. - dict: A dictionary with the sampled indices for each key (time_bin_info).
- Return type:
np.array
deepof.deepof_train_embeddings module
Model training command line tool for the deepof package. usage: python -m examples.model_training -h
deepof.hypermodels module
keras_tuner hypermodels for hyperparameter tuning of deep autoencoders in deepof.models.
- class deepof.hypermodels.VaDE(input_shape: tuple, latent_dim: int, batch_size: int, n_components: int = 10, learn_rate: float = 0.001, edge_feature_shape: tuple | None = None, use_gnn: bool = False, adjacency_matrix: ndarray | None = None)
Bases:
HyperModelHyperparameter tuning pipeline for deepof.models.VaDE.
- __init__(input_shape: tuple, latent_dim: int, batch_size: int, n_components: int = 10, learn_rate: float = 0.001, edge_feature_shape: tuple | None = None, use_gnn: bool = False, adjacency_matrix: ndarray | None = None)
Build VaDE hypermodel for hyperparameter tuning.
- Parameters:
input_shape (tuple) – shape of the input tensor.
latent_dim (int) – dimension of the latent space.
batch_size (int) – batch size for training.
learn_rate (float) – learning rate for the optimizer.
n_components (int) – number of components in the quantization space.
edge_feature_shape (tuple) – shape of the edge feature tensor.
use_gnn (bool) – whether to use a graph neural network to encode the input data.
adjacency_matrix (np.ndarray) – adjacency matrix of the graph.
- get_hparams(hp)
Retrieve hyperparameters to tune.
- build(hp)
Override Hypermodel’s build method.
- class deepof.hypermodels.VQVAE(input_shape: tuple, latent_dim: int, n_components: int = 10, learn_rate: float = 0.001, edge_feature_shape: tuple | None = None, use_gnn: bool = False, adjacency_matrix: ndarray | None = None)
Bases:
HyperModelHyperparameter tuning pipeline for deepof.models.VQVAE.
- __init__(input_shape: tuple, latent_dim: int, n_components: int = 10, learn_rate: float = 0.001, edge_feature_shape: tuple | None = None, use_gnn: bool = False, adjacency_matrix: ndarray | None = None)
VQVAE hypermodel for hyperparameter tuning.
- Parameters:
input_shape (tuple) – shape of the input tensor.
latent_dim (int) – dimension of the latent space.
learn_rate (float) – learning rate for the optimizer.
n_components (int) – number of components in the quantization space.
edge_feature_shape (tuple) – shape of the edge feature tensor.
use_gnn (bool) – whether to use a graph neural network to encode the input data.
adjacency_matrix (np.ndarray) – adjacency matrix of the graph.
- get_hparams(hp)
Retrieve hyperparameters to tune, including the encoder type and the weight of the kmeans loss.
- build(hp)
Override Hypermodel’s build method.
- class deepof.hypermodels.Contrastive(input_shape: tuple, latent_dim: int, learn_rate: float = 0.001, edge_feature_shape: tuple | None = None, use_gnn: bool = False, adjacency_matrix: ndarray | None = None)
Bases:
HyperModelHyperparameter tuning pipeline for deepof.models.Contrastive.
- __init__(input_shape: tuple, latent_dim: int, learn_rate: float = 0.001, edge_feature_shape: tuple | None = None, use_gnn: bool = False, adjacency_matrix: ndarray | None = None)
Contrastive hypermodel for hyperparameter tuning.
- Parameters:
input_shape (tuple) – shape of the input tensor.
latent_dim (int) – dimension of the latent space.
learn_rate (float) – learning rate for the optimizer.
edge_feature_shape (tuple) – shape of the edge feature tensor.
use_gnn (bool) – whether to use a graph neural network to encode the input data.
adjacency_matrix (np.ndarray) – adjacency matrix of the graph.
- get_hparams(hp)
Retrieve hyperparameters to tune, including the encoder type and the weight of the kmeans loss.
- build(hp)
Override Hypermodel’s build method.
deepof.model_utils module
Utility functions for both training autoencoder models in deepof.models and tuning hyperparameters with deepof.hypermodels.
- deepof.model_utils.select_contrastive_loss(history, future, similarity, loss_fn='nce', temperature=0.1, tau=0.1, beta=0.1, elimination_topk=0.1)
Select and applies the contrastive loss function to be used in the Contrastive embedding models.
- Parameters:
history – Tensor of shape (batch_size, seq_len, embedding_dim).
future – Tensor of shape (batch_size, seq_len, embedding_dim).
similarity – Function that computes the similarity between two tensors.
loss_fn – String indicating the loss function to be used.
temperature – Float indicating the temperature to be used in the specified loss function.
tau – Float indicating the tau value to be used if DCL or hard DLC are selected.
beta – Float indicating the beta value to be used if hard DLC is selected.
elimination_topk – Float indicating the top-k value to be used if FC is selected.
- deepof.model_utils.nce_loss(history, future, similarity, temperature=0.1)
Compute the NCE loss function, as described in the paper “A Simple Framework for Contrastive Learning of Visual Representations” (https://arxiv.org/abs/2002.05709).
- deepof.model_utils.dcl_loss(history, future, similarity, temperature=0.1, debiased=True, tau_plus=0.1)
Compute the DCL loss function, as described in the paper “Debiased Contrastive Learning” (https://github.com/chingyaoc/DCL/).
- deepof.model_utils.fc_loss(history, future, similarity, temperature=0.1, elimination_topk=0.1)
Compute the FC loss function, as described in the paper “Fully-Contrastive Learning of Visual Representations” (https://arxiv.org/abs/2004.11362).
- deepof.model_utils.hard_loss(history, future, similarity, temperature, beta=0.0, debiased=True, tau_plus=0.1)
Compute the Hard loss function, as described in the paper “Contrastive Learning with Hard Negative Samples” (https://arxiv.org/abs/2011.03343).
- deepof.model_utils.compute_kmeans_loss(latent_means: Tensor, weight: float = 1.0, batch_size: int = 64)
Add a penalty to the singular values of the Gram matrix of the latent means. It helps disentangle the latent space.
Based on https://arxiv.org/pdf/1610.04794.pdf, and https://www.biorxiv.org/content/10.1101/2020.05.14.095430v3.
- Parameters:
latent_means (tf.Tensor) – tensor containing the means of the latent distribution
weight (float) – weight of the Gram loss in the total loss function
batch_size (int) – batch size of the data to compute the kmeans loss for.
- Returns:
kmeans loss
- Return type:
tf.Tensor
- deepof.model_utils.get_k_nearest_neighbors(tensor, k, index)
Retrieve indices of the k nearest neighbors in tensor to the vector with the specified index.
- Parameters:
tensor (tf.Tensor) – tensor to compute the k nearest neighbors for
k (int) – number of nearest neighbors to retrieve
index (int) – index of the vector to compute the k nearest neighbors for
- Returns:
indices of the k nearest neighbors
- Return type:
tf.Tensor
- deepof.model_utils.compute_shannon_entropy(tensor)
Compute Shannon entropy for a given tensor.
- Parameters:
tensor (tf.Tensor) – tensor to compute the entropy for
- Returns:
entropy of the tensor
- Return type:
tf.Tensor
- deepof.model_utils.plot_lr_vs_loss(rates, losses)
Plot learning rate versus the loss function of the model.
- Parameters:
rates (np.ndarray) – array containing the learning rates to plot in the x-axis
losses (np.ndarray) – array containing the losses to plot in the y-axis
- deepof.model_utils.get_angles(pos: int, i: int, d_model: int)
Auxiliary function for positional encoding computation.
- Parameters:
pos (int) – position in the sequence.
i (int) – number of sequences.
d_model (int) – dimensionality of the embeddings.
- deepof.model_utils.get_recurrent_block(x: Tensor, latent_dim: int, gru_unroll: bool, bidirectional_merge: str)
Build a recurrent embedding block, using a 1D convolution followed by two bidirectional GRU layers.
- Parameters:
x (tf.Tensor) – Input tensor.
latent_dim (int) – Number of dimensions of the output tensor.
gru_unroll (bool) – whether to unroll the GRU layers. Defaults to False.
bidirectional_merge (str) – how to merge the forward and backward GRU layers. Defaults to “concat”.
- Returns:
tf.keras.models.Model object with the specified architecture.
- deepof.model_utils.positional_encoding(position: int, d_model: int)
Compute positional encodings, as in https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- Parameters:
position (int) – position in the sequence.
d_model (int) – dimensionality of the embeddings.
- deepof.model_utils.create_padding_mask(seq: Tensor)
Create a padding mask, with zeros where data is missing, and ones where data is available.
- Parameters:
seq (tf.Tensor) – Sequence to compute the mask on
- deepof.model_utils.create_look_ahead_mask(size: int)
Create a triangular matrix containing an increasing amount of ones from left to right on each subsequent row.
Useful for transformer decoder, which allows it to go through the data in a sequential manner, without taking the future into account.
- Parameters:
size (int) – number of time steps in the sequence
- deepof.model_utils.create_masks(inp: Tensor)
Given an input sequence, it creates all necessary masks to pass it through the transformer architecture.
This includes encoder and decoder padding masks, and a look-ahead mask
- Parameters:
inp (tf.Tensor) – input sequence to create the masks for.
- deepof.model_utils.find_learning_rate(model, data, epochs=1, batch_size=32, min_rate=1e-08, max_rate=0.1)
Train the provided model for an epoch with an exponentially increasing learning rate.
- Parameters:
model (tf.keras.Model) – model to train
data (tuple) – training data
epochs (int) – number of epochs to train the model for
batch_size (int) – batch size to use for training
min_rate (float) – minimum learning rate to consider
max_rate (float) – maximum learning rate to consider
- Returns:
learning rate that resulted in the lowest loss
- Return type:
float
- deepof.model_utils.get_hard_counts(soft_counts: Tensor)
Compute hard counts per cluster in a differentiable way.
- Parameters:
soft_counts (tf.Tensor) – soft counts per cluster
- deepof.model_utils.cluster_frequencies_regularizer(soft_counts: Tensor, k: int, n_samples: int = 1000)
Compute the KL divergence between the cluster assignment distribution and a uniform prior across clusters.
While this assumes an equal distribution between clusters, the prior can be tweaked to reflect domain knowledge.
- Parameters:
soft_counts (tf.Tensor) – soft counts per cluster
k (int) – number of clusters
n_samples (int) – number of samples to draw from the categorical distribution modeling cluster assignments.
- deepof.model_utils.get_callbacks(embedding_model: str, encoder_type: str, kmeans_loss: float = 1.0, input_type: str = False, cp: bool = False, logparam: dict | None = None, outpath: str = '.', run: int = False) List[Any]
Generate callbacks used for model training.
- Parameters:
embedding_model (str) – name of the embedding model
encoder_type (str) – Architecture used for the encoder. Must be one of “recurrent”, “TCN”, and “transformer”
kmeans_loss (float) – Weight of the gram loss
input_type (str) – Input type to use for training
cp (bool) – Whether to use checkpointing or not
logparam (dict) – Dictionary containing the hyperparameters to log in tensorboard
outpath (str) – Path to the output directory
run (int) – Run number to use for checkpointing
- Returns:
List of callbacks to be used for training
- Return type:
List[Union[Any]]
- class deepof.model_utils.CustomStopper(start_epoch, *args, **kwargs)
Bases:
EarlyStoppingCustom early stopping callback. Prevents the model from stopping before warmup is over.
- __init__(start_epoch, *args, **kwargs)
Initialize the CustomStopper callback.
- Parameters:
start_epoch – epoch from which performance will be taken into account when deciding whether to stop training.
*args – arguments passed to the callback.
**kwargs – keyword arguments passed to the callback.
- get_config()
Update callback metadata.
- on_epoch_end(epoch, logs=None)
Check whether to stop training.
- class deepof.model_utils.ExponentialLearningRate(factor: float)
Bases:
CallbackSimple class that allows to grow learning rate exponentially during training.
Used to trigger optimal learning rate search in deepof.train_utils.find_learning_rate.
- __init__(factor: float)
Initialize the exponential learning rate callback.
- Parameters:
factor (float) – factor by which to multiply the learning rate
- on_batch_end(batch: int, logs: dict)
Apply on batch end.
- Parameters:
batch – batch number
logs (dict) – dictionary containing the loss for the current batch
- class deepof.model_utils.ProbabilisticDecoder(*args, **kwargs)
Bases:
LayerMap the reconstruction output of a given decoder to a multivariate normal distribution.
- __init__(input_shape, **kwargs)
Initialize the probabilistic decoder.
- call(inputs)
Map the reconstruction output of a given decoder to a multivariate normal distribution.
- Parameters:
inputs (tuple) – tuple containing the reconstruction output and the validity mask
- Returns:
multivariate normal distribution
- Return type:
tf.Tensor
- class deepof.model_utils.ClusterControl(*args, **kwargs)
Bases:
LayerIdentity layer.
Evaluates different clustering metrics between the components of the latent Gaussian Mixture using the entropy of the nearest neighbourhood. If self.loss_weight > 0, it also adds a regularization penalty to the loss function which attempts to maximize the number of populated clusters during training.
- __init__(batch_size: int, n_components: int, encoding_dim: int, k: int = 15, *args, **kwargs)
Initialize the ClusterControl layer.
- Parameters:
batch_size (int) – batch size of the model
n_components (int) – number of components in the latent Gaussian Mixture
encoding_dim (int) – dimension of the latent Gaussian Mixture
k (int) – number of nearest components of the latent Gaussian Mixture to consider
loss_weight (float) – weight of the regularization penalty applied to the local entropy of each training instance
*args – additional positional arguments
**kwargs – additional keyword arguments
- get_config()
Update Constraint metadata.
- call(inputs)
Update Layer’s call method.
- class deepof.model_utils.TransformerEncoderLayer(*args, **kwargs)
Bases:
LayerTransformer encoder layer. Based on https://www.tensorflow.org/text/tutorials/transformer.
- __init__(key_dim, num_heads, dff, rate=0.1)
Construct the transformer encoder layer.
- Parameters:
key_dim – dimensionality of the time series
num_heads – number of heads of the multi-head-attention layers
dff – dimensionality of the embeddings
rate – dropout rate
- call(x, training, mask, return_scores=False)
Call the transformer encoder layer.
- class deepof.model_utils.TransformerDecoderLayer(*args, **kwargs)
Bases:
LayerTransformer decoder layer. Based on https://www.tensorflow.org/text/tutorials/transformer.
- __init__(key_dim, num_heads, dff, rate=0.1)
Construct the transformer decoder layer.
- Parameters:
key_dim – dimensionality of the time series
num_heads – number of heads of the multi-head-attention layers
dff – dimensionality of the embeddings
rate – dropout rate
- call(x, enc_output, training, look_ahead_mask, padding_mask)
Call the transformer decoder layer.
- class deepof.model_utils.TransformerEncoder(*args, **kwargs)
Bases:
LayerTransformer encoder.
Based on https://www.tensorflow.org/text/tutorials/transformer. Adapted according to https://academic.oup.com/gigascience/article/8/11/giz134/5626377?login=true and https://arxiv.org/abs/1711.03905.
- __init__(num_layers, seq_dim, key_dim, num_heads, dff, maximum_position_encoding, rate=0.1)
Construct the transformer encoder.
- Parameters:
num_layers – number of transformer layers to include.
seq_dim – dimensionality of the sequence embeddings
key_dim – dimensionality of the time series
num_heads – number of heads of the multi-head-attention layers used on the transformer encoder
dff – dimensionality of the token embeddings
maximum_position_encoding – maximum time series length
rate – dropout rate
- call(x, training)
Call the transformer encoder.
- class deepof.model_utils.TransformerDecoder(*args, **kwargs)
Bases:
LayerTransformer decoder.
Based on https://www.tensorflow.org/text/tutorials/transformer. Adapted according to https://academic.oup.com/gigascience/article/8/11/giz134/5626377?login=true and https://arxiv.org/abs/1711.03905.
- __init__(num_layers, seq_dim, key_dim, num_heads, dff, maximum_position_encoding, rate=0.1)
Construct the transformer decoder.
- Parameters:
num_layers – number of transformer layers to include.
seq_dim – dimensionality of the sequence embeddings
key_dim – dimensionality of the time series
num_heads – number of heads of the multi-head-attention layers used on the transformer encoder
dff – dimensionality of the token embeddings
maximum_position_encoding – maximum time series length
rate – dropout rate
- call(x, enc_output, training, look_ahead_mask, padding_mask)
Call the transformer decoder.
- deepof.model_utils.log_hyperparameters()
Log hyperparameters in tensorboard.
Blueprint for hyperparameter and metric logging in tensorboard during hyperparameter tuning
- Returns:
List containing the hyperparameters to log in tensorboard. metrics (list): List containing the metrics to log in tensorboard.
- Return type:
logparams (list)
- deepof.model_utils.embedding_model_fitting(preprocessed_object: Tuple[ndarray, ndarray, ndarray, ndarray], adjacency_matrix: ndarray, embedding_model: str, encoder_type: str, batch_size: int, latent_dim: int, epochs: int, log_history: bool, log_hparams: bool, n_components: int, output_path: str, data_path: str, kmeans_loss: float, pretrained: str, save_checkpoints: bool, save_weights: bool, input_type: str, bin_info: dict, kl_annealing_mode: str, kl_warmup: int, reg_cat_clusters: float, recluster: bool, temperature: float, contrastive_similarity_function: str, contrastive_loss_function: str, beta: float, tau: float, interaction_regularization: float, run: int = 0, **kwargs)
Trains the specified embedding model on the preprocessed data.
- Parameters:
preprocessed_object (tuple) – Tuple containing the preprocessed data.
adjacency_matrix (np.ndarray) – adjacency_matrix (np.ndarray): adjacency matrix of the connectivity graph to use.
embedding_model (str) – Model to use to embed and cluster the data. Must be one of VQVAE (default), VaDE, and contrastive.
encoder_type (str) – Encoder architecture to use. Must be one of “recurrent”, “TCN”, and “transformer”.
batch_size (int) – Batch size to use for training.
latent_dim (int) – Encoding size to use for training.
epochs (int) – Number of epochs to train the autoencoder for.
log_history (bool) – Whether to log the history of the autoencoder.
log_hparams (bool) – Whether to log the hyperparameters used for training.
n_components (int) – Number of components to fit to the data.
output_path (str) – Path to the output directory.
data_path (str) – Path to the directory where intermediate data is saved
kmeans_loss (float) – Weight of the gram loss, which adds a regularization term to VQVAE models which penalizes the correlation between the dimensions in the latent space.
pretrained (str) – Path to the pretrained weights to use for the autoencoder.
save_checkpoints (bool) – Whether to save checkpoints during training.
save_weights (bool) – Whether to save the weights of the autoencoder after training.
input_type (str) – Input type of the TableDict objects used for preprocessing. For logging purposes only.
bin_info (dict) – Dictionary containing numpy integer arrays for each experiment. Each array denotes the samples to be sampled from the respective experiment.
parameters (# Contrastive Model specific)
kl_annealing_mode (str) – Mode to use for KL annealing. Must be one of “linear” (default), or “sigmoid”.
kl_warmup (int) – Number of epochs during which KL is annealed.
reg_cat_clusters (bool) – whether to penalize uneven cluster membership in the latent space, by minimizing the KL divergence between cluster membership and a uniform categorical distribution.
recluster (bool) – Whether to recluster the data after each training using a Gaussian Mixture Model.
parameters
temperature (float) – temperature parameter for the contrastive loss functions. Higher values put harsher penalties on negative pair similarity.
contrastive_similarity_function (str) – similarity function between positive and negative pairs. Must be one of ‘cosine’ (default), ‘euclidean’, ‘dot’, and ‘edit’.
contrastive_loss_function (str) – contrastive loss function. Must be one of ‘nce’ (default), ‘dcl’, ‘fc’, and ‘hard_dcl’. See specific documentation for details.
beta (float) – Beta (concentration) parameter for the hard_dcl contrastive loss. Higher values lead to ‘harder’ negative samples.
tau (float) – Tau parameter for the dcl and hard_dcl contrastive losses, indicating positive class probability.
interaction_regularization (float) – Weight of the interaction regularization term (L1 penalization to all features not related to interactions).
run (int) – Run number to use for logging.
- Returns:
List of trained models corresponding to the selected model class. The full trained model is last.
- deepof.model_utils.embedding_per_video(coordinates: deepof_coordinates, to_preprocess: deepof_table_dict, model: Model, scale: str = 'standard', animal_id: str | None = None, global_scaler: Any | None = None, pretrained: bool = False, samples_max: int = 227272, **kwargs)
Use a previously trained model to produce embeddings and soft_counts per experiment in table_dict format.
- Parameters:
coordinates (coordinates) – deepof.Coordinates object for the project at hand.
to_preprocess (table_dict) – dictionary with (merged) features to process.
model (tf.keras.models.Model) – trained deepof unsupervised model to run inference with.
pretrained (bool) – whether to use the specified pretrained model to recluster the data.
scale (str) – The type of scaler to use within animals. Defaults to ‘standard’, but can be changed to ‘minmax’, ‘robust’, or False. Use the same that was used when training the original model.
animal_id (str) – if more than one animal is present, provide the ID(s) of the animal(s) to include.
global_scaler (Any) – trained global scaler produced when processing the original dataset.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
**kwargs – additional arguments to pass to coordinates.get_graph_dataset().
- Returns:
embeddings per experiment. soft_counts (table_dict): soft_counts per experiment.
- Return type:
embeddings (table_dict)
- deepof.model_utils.tune_search(preprocessed_object: tuple, adjacency_matrix: ndarray, encoding_size: int, embedding_model: str, hypertun_trials: int, hpt_type: str, k: int, project_name: str, callbacks: List, batch_size: int = 1024, n_epochs: int = 30, n_replicas: int = 1, outpath: str = 'unsupervised_tuner_search') tuple
Define the search space using keras-tuner and hyperband or bayesian optimization.
- Parameters:
preprocessed_object (tf.data.Dataset) – Dataset object for training and validation.
adjacency_matrix (np.ndarray) – Adjacency matrix for the graph.
encoding_size (int) – Size of the encoding layer.
embedding_model (str) – Model to use to embed and cluster the data. Must be one of VQVAE (default), VaDE, and Contrastive.
hypertun_trials (int) – Number of hypertuning trials to run.
hpt_type (str) – Type of hypertuning to run. Must be one of “hyperband” or “bayesian”.
k (int) – Number of clusters on the latent space.
project_name (str) – Name of the project.
callbacks (List) – List of callbacks to use.
batch_size (int) – Batch size to use.
n_epochs (int) – Maximum number of epochs to train for.
n_replicas (int) – Number of replicas to use.
outpath (str) – Path to save the results.
- Returns:
Dictionary of the best hyperparameters. best_run (str): Name of the best run.
- Return type:
best_hparams (dict)
deepof.models module
deep autoencoder models for unsupervised pose detection.
VQ-VAE: a variational autoencoder with a vector quantization latent-space (https://arxiv.org/abs/1711.00937).
VaDE: a variational autoencoder with a Gaussian mixture latent-space.
Contrastive: an embedding model consisting of a single encoder, trained using a contrastive loss.
- deepof.models.get_recurrent_decoder(input_shape: tuple, latent_dim: int, gru_unroll: bool = False, bidirectional_merge: str = 'concat')
Return a recurrent neural decoder.
Builds a deep neural network capable of decoding the structured latent space generated by one of the compatible classes into a sequence of motion tracking instances, either reconstructing the original input, or generating new data from given clusters.
- Parameters:
input_shape (tuple) – shape of the input data
latent_dim (int) – dimensionality of the latent space
gru_unroll (bool) – whether to unroll the GRU layers. Defaults to False.
bidirectional_merge (str) – how to merge the forward and backward GRU layers. Defaults to “concat”.
- Returns:
a keras model that can be trained to decode the latent space into a series of motion tracking sequences.
- Return type:
keras.Model
- deepof.models.get_TCN_encoder(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool = True, conv_filters: int = 32, kernel_size: int = 4, conv_stacks: int = 2, conv_dilations: tuple = (1, 2, 4, 8), padding: str = 'causal', use_skip_connections: bool = True, dropout_rate: int = 0, activation: str = 'relu', interaction_regularization: float = 0.0)
Return a Temporal Convolutional Network (TCN) encoder.
Builds a neural network that can be used to encode motion tracking instances into a vector. Each layer contains a residual block with a convolutional layer and a skip connection. See the following paper for more details: https://arxiv.org/pdf/1803.01271.pdf
- Parameters:
input_shape – shape of the input data
edge_feature_shape (tuple) – shape of the adjacency matrix to use in the graph attention layers. Should be time x edges x features.
adjacency_matrix (np.ndarray) – adjacency matrix for the mice connectivity graph. Shape should be nodes x nodes.
latent_dim – dimensionality of the latent space
use_gnn (bool) – If True, the encoder uses a graph representation of the input, with coordinates and speeds as node attributes, and distances as edge attributes. If False, a regular 3D tensor is used as input.
conv_filters – number of filters in the TCN layers
kernel_size – size of the convolutional kernels
conv_stacks – number of TCN layers
conv_dilations – list of dilation factors for each TCN layer
padding – padding mode for the TCN layers
use_skip_connections – whether to use skip connections between TCN layers
dropout_rate – dropout rate for the TCN layers
activation – activation function for the TCN layers
interaction_regularization (float) – Regularization parameter for the interaction features
- Returns:
a keras model that can be trained to encode a sequence of motion tracking instances into a latent space using temporal convolutional networks.
- Return type:
keras.Model
- deepof.models.get_TCN_decoder(input_shape: tuple, latent_dim: int, conv_filters: int = 64, kernel_size: int = 4, conv_stacks: int = 1, conv_dilations: tuple = (8, 4, 2, 1), padding: str = 'causal', use_skip_connections: bool = True, dropout_rate: int = 0, activation: str = 'relu')
Return a Temporal Convolutional Network (TCN) decoder.
Builds a neural network that can be used to decode a latent space into a sequence of motion tracking instances. Each layer contains a residual block with a convolutional layer and a skip connection. See the following paper for more details: https://arxiv.org/pdf/1803.01271.pdf,
- Parameters:
input_shape – shape of the input data
latent_dim – dimensionality of the latent space
conv_filters – number of filters in the TCN layers
kernel_size – size of the convolutional kernels
conv_stacks – number of TCN layers
conv_dilations – list of dilation factors for each TCN layer
padding – padding mode for the TCN layers
use_skip_connections – whether to use skip connections between TCN layers
dropout_rate – dropout rate for the TCN layers
activation – activation function for the TCN layers
- Returns:
a keras model that can be trained to decode a latent space into a sequence of motion tracking instances using temporal convolutional networks.
- Return type:
keras.Model
- deepof.models.get_transformer_encoder(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool = True, num_layers: int = 4, num_heads: int = 64, dff: int = 128, dropout_rate: float = 0.1, interaction_regularization: float = 0.0)
Build a Transformer encoder.
Based on https://www.tensorflow.org/text/tutorials/transformer. Adapted according to https://academic.oup.com/gigascience/article/8/11/giz134/5626377?login=true and https://arxiv.org/abs/1711.03905.
- Parameters:
input_shape (tuple) – shape of the input data
edge_feature_shape (tuple) – shape of the adjacency matrix to use in the graph attention layers. Should be time x edges x features.
adjacency_matrix (np.ndarray) – adjacency matrix for the mice connectivity graph. Shape should be nodes x nodes.
latent_dim (int) – dimensionality of the latent space
use_gnn (bool) – If True, the encoder uses a graph representation of the input, with coordinates and speeds as node attributes, and distances as edge attributes. If False, a regular 3D tensor is used as input.
num_layers (int) – number of transformer layers to include
num_heads (int) – number of heads of the multi-head-attention layers used on the transformer encoder
dff (int) – dimensionality of the token embeddings
dropout_rate (float) – dropout rate
interaction_regularization (float) – regularization parameter for the interaction features
- deepof.models.get_transformer_decoder(input_shape, latent_dim, num_layers=2, num_heads=8, dff=128, dropout_rate=0.1)
Build a Transformer decoder.
Based on https://www.tensorflow.org/text/tutorials/transformer. Adapted according to https://academic.oup.com/gigascience/article/8/11/giz134/5626377?login=true and https://arxiv.org/abs/1711.03905.
- Parameters:
input_shape (tuple) – shape of the input data
latent_dim (int) – dimensionality of the latent space
num_layers (int) – number of transformer layers to include
num_heads (int) – number of heads of the multi-head-attention layers used on the transformer encoder
dff (int) – dimensionality of the token embeddings
dropout_rate (float) – dropout rate
- class deepof.models.VectorQuantizer(*args, **kwargs)
Bases:
ModelVector quantizer layer.
Quantizes the input vectors into a fixed number of clusters using L2 norm. Based on https://arxiv.org/pdf/1509.03700.pdf, and adapted for clustering using https://arxiv.org/abs/1806.02199. Implementation based on https://keras.io/examples/generative/vq_vae/.
- __init__(n_components, embedding_dim, beta, kmeans_loss: float = 0.0, **kwargs)
Initialize the VQ layer.
- Parameters:
n_components (int) – number of embeddings to use
embedding_dim (int) – dimensionality of the embeddings
beta (float) – beta value for the loss function
kmeans_loss (float) – regularization parameter for the Gram matrix
**kwargs – additional arguments for the parent class
- call(x)
Compute the VQ layer.
- Parameters:
x (tf.Tensor) – input tensor
- Returns:
output tensor
- Return type:
x (tf.Tensor)
- get_code_indices(flattened_inputs, return_soft_counts=False)
Getter for the code indices at any given time.
- Parameters:
flattened_inputs (tf.Tensor) – flattened input tensor (encoder output)
return_soft_counts (bool) – whether to return soft counts based on the distance to the codes, instead of the code indices
- Returns:
code indices tensor with cluster assignments.
- Return type:
encoding_indices (tf.Tensor)
- deepof.models.get_vqvae(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool, n_components: int, beta: float = 1.0, kmeans_loss: float = 0.0, encoder_type: str = 'recurrent', interaction_regularization: float = 0.0)
Build a Vector-Quantization variational autoencoder (VQ-VAE) model, adapted to the DeepOF setting.
- Parameters:
input_shape (tuple) – shape of the input to the encoder.
edge_feature_shape (tuple) – shape of the edge feature matrix used for graph representations.
adjacency_matrix (np.ndarray) – adjacency matrix of the connectivity graph to use.
latent_dim (int) – dimension of the latent space.
use_gnn (bool) – If True, the encoder uses a graph representation of the input, with coordinates and speeds as node attributes, and distances as edge attributes. If False, a regular 3D tensor is used as input.
n_components (int) – number of embeddings in the embedding layer.
beta (float) – beta parameter of the VQ loss.
kmeans_loss (float) – regularization parameter for the Gram matrix.
encoder_type (str) – type of encoder to use. Can be set to “recurrent” (default), “TCN”, or “transformer”.
interaction_regularization (float) – Regularization parameter for the interaction features.
- Returns:
connected encoder of the VQ-VAE model. Outputs a vector of shape (latent_dim,). decoder (tf.keras.Model): connected decoder of the VQ-VAE model. grouper (tf.keras.Model): connected embedder layer of the VQ-VAE model. Outputs cluster indices of shape (batch_size,). vqvae (tf.keras.Model): complete VQ VAE model.
- Return type:
encoder (tf.keras.Model)
- class deepof.models.VQVAE(*args, **kwargs)
Bases:
ModelVQ-VAE model adapted to the DeepOF setting.
- __init__(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray | None = None, latent_dim: int = 8, n_components: int = 15, beta: float = 1.0, kmeans_loss: float = 0.0, use_gnn: bool = True, encoder_type: str = 'recurrent', interaction_regularization: float = 0.0, **kwargs)
Initialize a VQ-VAE model.
- Parameters:
input_shape (tuple) – Shape of the input to the full model.
edge_feature_shape (tuple) – shape of the edge feature matrix used for graph representations.
adjacency_matrix (np.ndarray) – adjacency matrix of the connectivity graph to use.
latent_dim (int) – Dimensionality of the latent space.
n_components (int) – Number of embeddings (clusters) in the embedding layer.
beta (float) – Beta parameter of the VQ loss, as described in the original VQVAE paper.
kmeans_loss (float) – Regularization parameter for the Gram matrix.
encoder_type (str) – Type of encoder to use. Can be set to “recurrent” (default), “TCN”, or “transformer”.
interaction_regularization (float) – Regularization parameter for the interaction features.
**kwargs – Additional keyword arguments.
- call(inputs, **kwargs)
Call the VQVAE model.
- property metrics
Initialize VQVAE tracked metrics.
- train_step(data)
Perform a training step.
- test_step(data)
Performs a test step.
- class deepof.models.GaussianMixtureLatent(*args, **kwargs)
Bases:
ModelGaussian Mixture probabilistic latent space model.
Used to represent the embedding of motion tracking data in a mixture of Gaussians with a provided number of components, with means, covariances and weights. Implementation based on VaDE (https://arxiv.org/abs/1611.05148) and VaDE-SC (https://openreview.net/forum?id=RQ428ZptQfU).
- __init__(input_shape: tuple, n_components: int, latent_dim: int, batch_size: int, kl_warmup: int = 5, kl_annealing_mode: str = 'linear', mc_kl: int = 100, mmd_warmup: int = 15, mmd_annealing_mode: str = 'linear', kmeans_loss: float = 0.0, reg_cluster_variance: bool = False, **kwargs)
Initialize the Gaussian Mixture Latent layer.
- Parameters:
input_shape (tuple) – shape of the input data
n_components (int) – number of components in the Gaussian mixture.
latent_dim (int) – dimensionality of the latent space.
batch_size (int) – batch size for training.
kl_warmup (int) – number of epochs to warm up the KL divergence.
kl_annealing_mode (str) – mode to use for annealing the KL divergence. Must be one of “linear” and “sigmoid”.
mc_kl (int) – number of Monte Carlo samples to use for computing the KL divergence.
mmd_warmup (int) – number of epochs to warm up the MMD.
mmd_annealing_mode (str) – mode to use for annealing the MMD. Must be one of “linear” and “sigmoid”.
kmeans_loss (float) – weight of the Gram matrix regularization loss.
reg_cluster_variance (bool) – whether to penalize uneven cluster variances in the latent space.
**kwargs – keyword arguments passed to the parent class
- call(inputs, training=False)
Compute the output of the layer.
- deepof.models.get_vade(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool, n_components: int, batch_size: int = 64, kl_warmup: int = 15, kl_annealing_mode: str = 'sigmoid', mc_kl: int = 100, kmeans_loss: float = 1.0, reg_cluster_variance: bool = False, encoder_type: str = 'recurrent', interaction_regularization: float = 0.0)
Build a Gaussian mixture variational autoencoder (VaDE) model, adapted to the DeepOF setting.
- Parameters:
input_shape (tuple) – shape of the input data.
edge_feature_shape (tuple) – shape of the edge feature matrix used for graph representations.
adjacency_matrix (np.ndarray) – adjacency matrix of the connectivity graph to use.
latent_dim (int) – dimensionality of the latent space.
use_gnn (bool) – If True, the encoder uses a graph representation of the input, with coordinates and speeds as node attributes, and distances as edge attributes. If False, a regular 3D tensor is used as input.
n_components (int) – number of components in the Gaussian mixture.
batch_size (int) – batch size for training.
kl_warmup (int) – Number of iterations during which to warm up the KL divergence.
kl_annealing_mode (str) – mode to use for annealing the KL divergence. Must be one of “linear” and “sigmoid”.
mc_kl (int) – number of Monte Carlo samples to use for computing the KL divergence.
kmeans_loss (float) – weight of the Gram matrix loss as described in deepof.model_utils.compute_kmeans_loss.
reg_cluster_variance (bool) – whether to penalize uneven cluster variances in the latent space.
encoder_type (str) – type of encoder to use. Can be set to “recurrent” (default), “TCN”, or “transformer”.
interaction_regularization (float) – weight of the interaction regularization term.
- Returns:
connected encoder of the VQ-VAE model. Outputs a vector of shape (latent_dim,). decoder (tf.keras.Model): connected decoder of the VQ-VAE model. grouper (tf.keras.Model): deep clustering branch of the VQ-VAE model. Outputs a vector of shape (n_components,) for each training instance, corresponding to the soft counts for each cluster. vade (tf.keras.Model): complete VaDE model
- Return type:
encoder (tf.keras.Model)
- class deepof.models.Classifier(*args, **kwargs)
Bases:
ModelClassifier for supervised pose motif elucidation.
- __init__(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray | None = None, use_gnn: bool = True, batch_size: int = 2048, bias_initializer: float = 0.0, encoder_type: str = 'recurrent', **kwargs)
Initialize a classifier model.
- Parameters:
input_shape (tuple) – shape of the input data.
edge_feature_shape (tuple) – shape of the edge feature matrix used for graph representations.
adjacency_matrix (np.ndarray) – adjacency matrix of the connectivity graph to use.
use_gnn (bool) – If True, the encoder uses a graph representation of the input, with coordinates and speeds as node attributes, and distances as edge attributes. If False, a regular 3D tensor is used as input.
batch_size (int) – batch size for training.
encoder_type (str) – type of encoder to use. Can be set to “recurrent” (default), “TCN”, or “transformer”.
bias_initializer (float) – value to initialize the bias of the last layer to (default: 0.0).
- call(inputs, training=None, mask=None)
Apply a forward pass of the classifier.
- Parameters:
inputs (-) – input data.
training (-) – whether the model is in training mode.
mask (-) – mask for the input data.
- class deepof.models.VaDE(*args, **kwargs)
Bases:
ModelGaussian Mixture Variational Autoencoder for pose motif elucidation.
- __init__(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray | None = None, latent_dim: int = 8, use_gnn: bool = True, n_components: int = 15, batch_size: int = 64, kl_annealing_mode: str = 'linear', kl_warmup_epochs: int = 15, montecarlo_kl: int = 100, kmeans_loss: float = 1.0, reg_cat_clusters: float = 1.0, reg_cluster_variance: bool = False, encoder_type: str = 'recurrent', interaction_regularization: float = 0.0, **kwargs)
Init a VaDE model.
- Parameters:
input_shape (tuple) – Shape of the input to the full model.
edge_feature_shape (tuple) – shape of the edge feature matrix used for graph representations.
adjacency_matrix (np.ndarray) – adjacency matrix of the connectivity graph to use.
batch_size (int) – Batch size for training.
latent_dim (int) – Dimensionality of the latent space.
use_gnn (bool) – If True, the encoder uses a graph representation of the input, with coordinates and speeds as node attributes, and distances as edge attributes. If False, a regular 3D tensor is used as input.
kl_annealing_mode (str) – Annealing mode for KL annealing. Can be one of ‘linear’ and ‘sigmoid’.
kl_warmup_epochs (int) – Number of epochs to warmup KL annealing.
montecarlo_kl (int) – Number of Monte Carlo samples for KL divergence.
n_components (int) – Number of mixture components in the latent space.
kmeans_loss (float) – weight of the gram matrix regularization loss.
reg_cat_clusters (bool) – whether to use the penalized uneven cluster membership in the latent space, by minimizing the KL divergence between cluster membership and a uniform categorical distribution.
reg_cluster_variance (bool) – whether to penalize uneven cluster variances in the latent space.
encoder_type (str) – type of encoder to use. Can be set to “recurrent” (default), “TCN”, or “transformer”.
interaction_regularization (float) – Regularization parameter for the interaction features.
**kwargs – Additional keyword arguments.
- property metrics
Initializes tracked metrics of VaDE model.
- property get_gmm_params
Return the GMM parameters of the model.
- set_pretrain_mode(switch)
Set the pretrain mode of the model.
- pretrain(data, embed_x, embed_a, epochs=10, samples=10000, gmm_initialize=True, **kwargs)
Run a GMM directed pretraining of the encoder, to minimize the likelihood of getting stuck in a local minimum.
- call(inputs, **kwargs)
Call the VaDE model.
- train_step(data)
Perform a training step.
- test_step(data)
Performs a test step.
- class deepof.models.Contrastive(*args, **kwargs)
Bases:
ModelSelf-supervised contrastive embeddings.
- __init__(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray | None = None, encoder_type: str = 'TCN', latent_dim: int = 8, use_gnn: bool = True, temperature: float = 0.1, similarity_function: str = 'cosine', loss_function: str = 'nce', beta: float = 0.1, tau: float = 0.1, interaction_regularization: float = 0.0, **kwargs)
Init a self-supervised Contrastive embedding model.
- Parameters:
input_shape (tuple) – Shape of the input to the full model.
edge_feature_shape (tuple) – shape of the edge feature matrix used for graph representations.
adjacency_matrix (np.ndarray) – adjacency matrix of the connectivity graph to use.
encoder_type (str) – type of encoder to use. Can be set to “recurrent” (default), “TCN”, or “transformer”.
latent_dim (int) – Dimensionality of the latent space.
use_gnn (bool) – If True, the encoder uses a graph representation of the input, with coordinates and speeds as node attributes, and distances as edge attributes. If False, a regular 3D tensor is used as input.
temperature (float)
similarity_function (str)
loss_function (str)
beta (float)
tau (float)
interaction_regularization (float) – Regularization parameter for the interaction features.
**kwargs – Additional keyword arguments.
- property metrics
Initializes tracked metrics of the contrastive model.
- call(inputs, **kwargs)
Call the contrastive model.
- train_step(data)
Perform a training step.
- test_step(data)
Performs a test step.
deepof.post_hoc module
Data structures and functions for analyzing supervised and unsupervised model results.
- deepof.post_hoc.get_contrastive_soft_counts(coordinates, embeddings: Dict[str, ndarray], states: str | int = 'bic', min_states: int = 2, max_states: int = 25, reg_covar: float = 1e-05, sample_size: int = 500000, random_state: int = 0, p_stay: float = 0.95, soft_counts: Dict[str, ndarray] | None = None, min_confidence: float | None = 0.75, prior_weight: float = 1.0)
Extract soft counts for contrastive model.
If soft_counts is provided, it is used as a per-frame prior over states (clusters), biasing the forward–backward posteriors (HMM smoothing) without running EM training.
Notes
If soft_counts is provided, K is taken from its second dimension (and AIC/BIC search is skipped).
Priors are applied as: log_emiss += prior_weight * log(soft_counts).
If min_confidence is not None, frames with max prior <= min_confidence are replaced by uniform priors.
- deepof.post_hoc.get_contrastive_soft_counts_gmm(coordinates, embeddings: Dict[str, ndarray], animal_ids: list, *, window_size: int = 12, supervised_annotations=None, K_pose: int = 8, M_bins: int = 3, binning: str = 'quantile', fixed_edges: list | None = None, reg_covar: float = 1e-05, sample_size: int = 200000, random_state: int = 0, embedding_gates: Any = 'Center', smoothing: float = 0.0001)
Distance/behavior-gated GMM decoder.
- Parameters:
coordinates – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
animal_ids (list) – list of animal ids of all animals that should be included in the gating
window_size (int) – size of the window that should be used for binning
supervised_annotations (table_dict) – table dict with supervised annotations per video.
K_pose (int) – bins per gate
M_bins (int) – number of gates
binning (str) – binning process to be used for gating. Can be “quantile” for even sized bins or “fixed” for specific bins. “quantile” is default.
fixed_edges (list) – Optional list of edges for binning, will be ignored wenn binning is not fixed.
reg_covar (float) – Covariance regularization for the GMM to ensure positive covariance matrices.
sample_size (int) – Sample size to be used for cluster prediction.
random_state (int) – Random state for reproducibility
embedding_gates (any) – Either a bodypart name for distance binning or, if supervised_annotations are given, alternatively a behavior name.
smoothing (float)
- deepof.post_hoc.get_pairwise_distances(coordinates, window_len: int, supervised_annotations=None, embedding_gates: Any = 'Nose', behavior_combinations: bool = True) Dict[str, Dict]
Per-window gating series: pairwise distances OR behavior-combination codes.
- Fixes vs original:
deterministic behavior ordering (sorted, not set)
guards against all-NaN distance columns
reports which behaviors were dropped
validates bodypart existence in distance mode
- deepof.post_hoc.get_contrastive_soft_counts_msm_pcca(coordinates, embeddings: Dict[str, ndarray], animal_ids: list, *, window_size: int = 12, supervised_annotations=None, K_pose: int = 10, M_bins: int = 3, binning: str = 'quantile', fixed_edges: list | None = None, sample_size: int = 200000, random_state: int = 0, embedding_gates: Any = 'Center', smoothing: float = 0.0001, temporal_smooth_win: int | None = 3, n_micro: int = 400, min_micro_per_macro: int = 3, lagtime: int = 3)
Distance/behavior-gated MSM + PCCA with k-means microstates.
- Parameters:
coordinates – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
animal_ids (list) – list of animal ids of all animals that should be included in the gating
window_size (int) – size of the window that should be used for binning
supervised_annotations (table_dict) – table dict with supervised annotations per video.
K_pose (int) – bins per gate
M_bins (int) – number of gates
binning (str) – binning process to be used for gating. Can be “quantile” for even sized bins or “fixed” for specific bins. “quantile” is default.
fixed_edges (list) – Optional list of edges for binning, will be ignored wenn binning is not fixed.
sample_size (int) – Sample size to be used for cluster prediction.
random_state (int) – Random state for reproducibility
embedding_gates (any) – Either a bodypart name for distance binning or, if supervised_annotations are given, alternatively a behavior name.
smoothing (float)
temporal_smooth_win (int) – Length of temporal smooting window. Longer means smoother.
n_micro (int) – Number of micro states
min_micro_per_macro (int) – Minimum number of micro states within each macro state
lagtime (int) – number of continuous frames used for state transitions (in practise input lagtime+2)
- deepof.post_hoc.recluster(coordinates: deepof_coordinates, embeddings: deepof_table_dict, soft_counts: deepof_table_dict | None = None, min_confidence: float = 0.75, states: str | int = 'aic', pretrained: bool | str = False, covariance_type: str = 'diag', min_states: int = 2, max_states: int = 12, save: bool = True)
Recluster the data using a HMM-based approach. If soft_counts is provided, the model will use the soft cluster assignments as priors for a semi-supervised HMM.
- Parameters:
coordinates – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
min_confidence (float) – minimum confidence the model should assign to a data point for the model to avoid resorting to a uniform prior around it.
states – Number of states to use for the HMM. If “aic” or “bic”, the number of states is chosen by minimizing the AIC or BIC criteria (respectively) over a predefined range of states.
pretrained – Whether to use a pretrained model or not. If True, DeepOF will search for an existing file with the provided parameters. If a string, DeepOF will search for a file with the provided name.
covariance_type – Type of covariance matrix to use for the HMM. Can be either “full”, “diag”, or “sphere”.
min_states – Minimum number of states to use for the HMM if automatic search is enabled.
max_states – Maximum number of states to use for the HMM if automatic search is enabled.
exclude_keys (list) – list of keys to exclude
save – Whether to save the trained model or not.
- Returns:
table dict with soft cluster assignments per animal experiment across time, using the new HMM-based segmentation on the embedding space.
- Return type:
soft_counts (table_dict)
- deepof.post_hoc.get_time_on_cluster(soft_counts: deepof_table_dict, normalize: bool = True, reduce_dim: bool = False, bin_info: dict | ndarray | None = None, roi_number: int | None = None, animals_in_roi: list | None = None)
Compute how much each animal spends on each cluster.
Requires a set of cluster assignments.
- Parameters:
soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
normalize (bool) – Whether to normalize the time by the total number of frames in each condition.
reduce_dim (bool) – Whether to reduce the dimensionality of the embeddings to 2D. If False, the embeddings are kept in their original dimensionality.
bin_info (Union[dict,np.ndarray]) – A dictionary or single array containing start and end positions of all sections for given embeddings and ROIs
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
- Returns:
A dataframe with the time spent on each cluster for each experiment.
- deepof.post_hoc.condition_distance_binning(embedding: deepof_table_dict, soft_counts: deepof_table_dict, exp_conditions: dict, start_bin: int | None = None, end_bin: int | None = None, step_bin: int | None = None, scan_mode: str = 'growing_window', precomputed_bins: ndarray | None = None, agg: str = 'mean', metric: str = 'auc', n_jobs: int = 2)
Compute the distance between the embeddings of two conditions, using the specified aggregation method.
- Parameters:
embedding (TableDict) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
start_bin (int) – The index of the first bin to compute the distance for.
end_bin (int) – The index of the last bin to compute the distance for.
step_bin (int) – The step size of the bins to compute the distance for.
scan_mode (str) – The mode to use for computing the distance. Can be one of “growing-window” (used to select optimal binning), “per-bin” (used to evaluate how discriminability evolves in subsequent bins of a specified size) or “precomputed”, which requires a numpy ndarray with bin IDs to be passed to precomputed_bins.
precomputed_bins (np.ndarray) – numpy array with integer bin sizes in frames, do not necessarily need to have the same size. Difference across conditions for each of these bins will be reported.
agg (str) – The aggregation method to use. Can be either “mean”, “median”, or “time_on_cluster”.
metric (str) – The distance metric to use. Can be either “auc” (where the reported ‘distance’ is based on performance of a classifier when separating aggregated embeddings), or “wasserstein” (which computes distances based on optimal transport).
n_jobs (int) – The number of jobs to use for parallel processing.
- Returns:
An array with distances between conditions across the resulting time bins
- deepof.post_hoc.separation_between_conditions(cur_embedding: deepof_table_dict, cur_soft_counts: deepof_table_dict, bin_info: dict | ndarray, exp_conditions: dict, agg: str, metric: str)
Compute the distance between the embeddings of two conditions, using the specified aggregation method.
- Parameters:
cur_embedding (TableDict) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
cur_soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
bin_info (Union[dict,np.ndarray]) – A dictionary or single array containing start and end positions or indices of all sections for given embeddings
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
agg (str) – The aggregation method to use. Can be one of “time on cluster”, “mean”, or “median”.
metric (str) – The distance metric to use. Can be either “auc” (where the reported ‘distance’ is based on performance of a classifier when separating aggregated embeddings), or “wasserstein” (which computes distances based on optimal transport).
- Returns:
The distance between the embeddings of the two conditions.
- deepof.post_hoc.fit_normative_global_model(global_normal_embeddings: DataFrame)
Fit a global model to the normal embeddings.
- Parameters:
global_normal_embeddings (pd.DataFrame) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
- Returns:
A fitted global model.
- deepof.post_hoc.enrichment_across_conditions(soft_counts: deepof_table_dict | None = None, supervised_annotations: deepof_table_dict | None = None, exp_conditions: dict | None = None, plot_speed: bool = False, bin_info: dict | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', normalize: bool = False)
Compute the population of each cluster across conditions.
- Parameters:
soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
supervised_annotations (tableDict) – table dict with supervised annotations per animal experiment across time.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
plot_speed (bool) – plot “speed” behavior
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings and ROIs
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
normalize (bool) – Whether to normalize the population of each cluster across conditions.
- Returns:
A long format dataframe with the population of each cluster across conditions.
- deepof.post_hoc.get_transitions(state_sequence: list, n_states: int, index_sequence: list | None = None)
Compute the transitions between states in a state sequence.
- Parameters:
state_sequence (list) – A list of states.
n_states (int) – The number of states.
index_sequence (list) – An optional list of index positions for the states. Will ensure that state transitions between non-neighboring sequence entries are skipped
- Returns:
The resulting transition matrix.
- deepof.post_hoc.compute_transition_matrix_per_condition(soft_counts: deepof_table_dict, exp_conditions: dict, silence_diagonal: bool = False, bin_info: dict | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, aggregate: str = True, normalize: str = True)
Compute the transition matrices specific to each condition.
- Parameters:
soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding
silence_diagonal (bool) – If True, diagonal elements on the transition matrix are set to zero.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings and ROI information
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
aggregate (str) – Whether to aggregate the embeddings across time.
normalize (str) – Whether to normalize the population of each cluster across conditions.
- Returns:
A dictionary of transition matrices, where the keys are the names of the experimental conditions, and the values are the transition matrices for each condition.
- deepof.post_hoc.compute_steady_state(transition_matrices: dict, return_entropy: bool = False, n_iters: int = 100000)
Compute the steady state of each transition matrix provided in a dictionary.
- Parameters:
transition_matrices (dict) – A dictionary of transition matrices, where the keys are the names of the experimental conditions, and the values are the transition matrices for each condition.
return_entropy (bool) – Whether to return the entropy of the steady state. If False, the steady states themselves are returned.
n_iters (int) – The number of iterations to use for the Markov chain.
- Returns:
A dictionary of steady states, where the keys are the names of the experimental conditions, and the values are the steady states for each condition. If return_entropy is True, values correspond to the entropy of each steady state.
- deepof.post_hoc.compute_UMAP(embeddings, cluster_assignments)
Compute UMAP embeddings for visualization purposes.
- deepof.post_hoc.align_deepof_kinematics_with_unsupervised_labels(deepof_project: deepof_coordinates, kin_derivative: int = 1, center: str = 'Center', align: str = 'Spine_1', include_feature_derivatives: bool = False, include_distances: bool = True, include_angles: bool = True, include_areas: bool = True, animal_id: str | None = None, file_name: str = 'kinematics', return_path: bool = False)
Align kinematics with unsupervised labels.
In order to annotate time chunks with as many relevant features as possible, this function aligns the kinematics of a deepof project (speed and acceleration of body parts, distances, and angles) with the hard cluster assignments obtained from the unsupervised pipeline.
- Parameters:
deepof_project (coordinates) – A deepof.Project object.
kin_derivative (int) – The order of the derivative to use for the kinematics. 1 = speed, 2 = acceleration, etc.
center (str) – Body part to center coordinates on. “Center” by default.
align (str) – Body part to rotationally align the body parts with. “Spine_1” by default.
include_feature_derivatives (bool) – Whether to compute speed on distances, angles, and areas, if they are included.
include_distances (bool) – Whether to include distances in the alignment.
include_angles (bool) – Whether to include angles in the alignment.
include_areas (bool) – Whether to include areas in the alignment.
animal_id (str) – The animal ID to use, in case of multi-animal projects.
file_name (str) – Name of table for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.
- Returns:
A dictionary of aligned kinematics, where the keys are the names of the experimental conditions, and the values are the aligned kinematics for each condition.
- deepof.post_hoc.chunk_summary_statistics(chunked_dataset: ndarray, body_part_names: list)
Extract summary statistics from a chunked dataset using seglearn.
- Parameters:
chunked_dataset (np.ndarray) – Preprocessed training set (of shape chunks x time x features), where each entry corresponds to a time chunk of data.
body_part_names (list) – A list of the names of the body parts.
- Returns:
A dataframe of kinematic features, of shape chunks by features.
- deepof.post_hoc.annotate_time_chunks(deepof_project: deepof_coordinates, soft_counts: deepof_table_dict, supervised_annotations: deepof_table_dict | None = None, window_size: int | None = None, window_step: int = 1, animal_id: str | None = None, samples: int = 10000, min_confidence: float = 0.0, kin_derivative: int = 1, include_distances: bool = True, include_angles: bool = True, include_areas: bool = True, aggregate: str = 'mean')
Annotate time chunks produced after change-point detection using the unsupervised pipeline.
Uses a set of summary statistics coming from kinematics, distances, angles, and supervised labels when provided.
- Parameters:
deepof_project (coordinates) – Project object.
soft_counts (table_dict) – matrix with soft cluster assignments produced by the unsupervised pipeline.
supervised_annotations (table_dict) – set of supervised annotations produced by the supervised pipeline withing deepof.
window_size (int) – Minimum size of the applied ruptures. If automatic_changepoints is False, specifies the size of the sliding window to pass through the data to generate training instances. None defaults to video frame-rate.
window_step (int) – Specifies the minimum jump for the rupture algorithms. If automatic_changepoints is False, specifies the step to take when sliding the aforementioned window. In this case, a value of 1 indicates a true sliding window, and a value equal to window_size splits the data into non-overlapping chunks.
animal_id (str) – The animal ID to use, in case of multi-animal projects.
samples (int) – Time chunks samples to take to reduce computational time. Defaults to the minimum between 10000 and the number of available chunks.
min_confidence (float) – minimum confidence in cluster assignments used for quality control filtering.
kin_derivative (int) – The order of the derivative to use for the kinematics. 1 = speed, 2 = acceleration, etc.
include_distances (bool) – Whether to include distances in the alignment. kin_derivative is taken into account.
include_angles (bool) – Whether to include angles in the alignment. kin_derivative is taken into account.
include_areas (bool) – Whether to include areas in the alignment. kin_derivative is taken into account.
aggregate (str) – aggregation mode. Can be either “mean” (computationally cheapest), just use the average per feature, or “seglearn” which runs a thorough feature extraction and selection pipeline on each time series.
- Returns:
A dataframe of kinematic features, of shape chunks by features.
- deepof.post_hoc.chunk_cv_splitter(chunk_stats: DataFrame, bin_info: dict, n_folds: int | None = None)
Split a dataset into training and testing sets, grouped by video.
Given a matrix with extracted features per chunk, returns a list containing a set of cross-validation folds, grouped by experimental video. This makes sure that chunks coming from the same experiment will never be leaked between training and testing sets.
- Parameters:
chunk_stats (pd.DataFrame) – matrix with statistics per chunk, sorted by experiment.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings
n_folds (int) – number of cross-validation folds to compute.
- Returns:
list containing a training and testing set per CV fold.
- deepof.post_hoc.train_supervised_cluster_detectors(chunk_stats: DataFrame, hard_counts: ndarray, bin_info: dict, n_folds: int | None = None, verbose: int = 1)
Train supervised models to detect clusters from kinematic features.
- Parameters:
chunk_stats (pd.DataFrame) – table with descriptive statistics for a series of sequences (‘chunks’).
hard_counts (np.ndarray) – cluster assignments for the corresponding ‘chunk_stats’ table.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings
n_folds (int) – number of folds for cross validation. If None (default) leave-one-experiment-out CV is used.
verbose (int) – verbosity level. Must be an integer between 0 (nothing printed) and 3 (all is printed).
- Returns:
trained supervised model on the full dataset, mapping chunk stats to cluster assignments. Useful to run the SHAP explainability pipeline. cluster_gbm_performance (dict): cross-validated dictionary containing trained estimators and performance metrics. groups (list): cross-validation indices. Data from the same animal are never shared between train and test sets.
- Return type:
full_cluster_clf (imblearn.pipeline.Pipeline)
deepof.utils module
Functions and general utilities for the deepof package.
- class deepof.utils.KeyErrorMessage
Bases:
str
- deepof.utils.rts_smoother_numba(measurements, F, H, Q, R)
Implements the Rauch-Tung-Striebel (RTS) smoother for state estimation.
This function performs both forward and backward passes to estimate the optimal state sequence given a set of noisy measurements. It first applies the Kalman filter in a forward pass and then refines the estimates using the RTS smoother in a backward pass.
- Parameters:
measurements (np.ndarray) – Array of measurements, shape (n_timesteps, n_dim_measurement).
F (np.ndarray) – State transition matrix, shape (n_dim_state, n_dim_state).
H (np.ndarray) – Observation matrix, shape (n_dim_measurement, n_dim_state).
Q (np.ndarray) – Process noise covariance matrix, shape (n_dim_state, n_dim_state).
R (np.ndarray) – Measurement noise covariance matrix, shape (n_dim_measurement, n_dim_measurement).
- Returns:
Smoothed state estimates, shape (n_timesteps, n_dim_state).
- Return type:
smoothed_states (np.ndarray)
- deepof.utils.enforce_skeleton_constraints_numba(data, skeleton_constraints, original_pos, tolerance=0.1, correction_factor=0.5)
Adjusts the positions of body parts in each frame to ensure that the distances between connected parts adhere to predefined skeleton constraints within a specified tolerance.
- Parameters:
data (np.ndarray) – Motion capture data, shape (n_frames, n_body_parts, 2).
skeleton_constraints (list) – List of tuples (part1, part2, dist) defining the constraints between body parts and their expected distances.
original_pos (np.ndarray) – Boolean array indicating original (non-interpolated) positions, shape (n_frames, n_body_parts, 2).
tolerance (float) – Allowable deviation from the constraint distance (default: 0.1).
correction_factor (float) – Factor to control the strength of position adjustments (default: 0.5).
- Returns:
Adjusted motion capture data with enforced skeleton constraints.
- Return type:
np.ndarray
- class deepof.utils.MouseTrackingImputer(n_iterations=10, connectivity=None, full_imputation=False)
Bases:
objectA class for imputing and processing mouse tracking data.
This class provides methods for interpolating missing data points, enforcing skeleton constraints, and smoothing trajectories in mouse tracking experiments.
- n_iterations
Number of iterations for imputation (default: 10).
- Type:
int
- connectivity
Connectivity information for body parts.
- Type:
object
- full_imputation
Whether to perform full imputation or only a partial linear imputation (default: False).
- Type:
bool
- body_part_indices
Mapping of body part names to indices.
- Type:
OrderedDict
- skeleton_constraints
List of skeleton constraints.
- Type:
list
- mouse_body_estimation_samples
Number of sample frames with non-nan data to estimate valid mouse shapes (default: 100).
- Type:
int
- lin_interp_limit
Limit for linear interpolation (default: 3).
- Type:
int
- __init__(n_iterations=10, connectivity=None, full_imputation=False)
- fit_transform(**kwargs)
- deepof.utils.connect_mouse(animal_ids=None, exclude_bodyparts: list | None = None, graph_preset: str = 'deepof_14') Graph
Create a nx.Graph object with the connectivity of the bodyparts in the DLC topview model for a single mouse.
Used later for angle computing, among others.
- Parameters:
animal_ids (str) – if more than one animal is tagged, specify the animal identyfier as a string.
exclude_bodyparts (list) – Remove the specified nodes from the graph.
graph_preset (str) – Connectivity preset to use. Currently supported: “deepof_14”, “deepof_11” and “deepof_8”.
- Returns:
connectivity (nx.Graph)
- deepof.utils.edges_to_weighted_adj(adj: ndarray, edges: ndarray)
Convert an edge feature matrix to a weighted adjacency matrix.
- Parameters:
adj (-) – binary adjacency matrix of the current graph.
edges (-) – edge feature matrix. Last two axes should be of shape nodes x features.
- deepof.utils.enumerate_all_bridges(G: <module 'networkx.classes.graph' from '/home/docs/checkouts/readthedocs.org/user_builds/deepof/envs/latest/lib/python3.9/site-packages/networkx/classes/graph.py'>) list
Enumerate all 3-node connected sequences in the given graph.
- Parameters:
G (-) – Animal connectivity graph.
- Returns:
List with all 3-node connected sequences in the provided graph.
- Return type:
bridges (list)
- deepof.utils.str2bool(v: str) bool
Return the passed string as a boolean.
- Parameters:
v (str) – String to transform to boolean value.
- Returns:
If conversion is not possible, it raises an error
- Return type:
bool
- deepof.utils.compute_animal_presence_mask(quality: deepof_table_dict, threshold: float = 0.5) deepof_table_dict
Compute a mask of the animal presence in the video.
- Parameters:
quality (table_dict) – Dictionary with the quality of the tracking for each body part and animal.
threshold (float) – Threshold for the quality of the tracking. If the quality is below this threshold, the animal is considered to be absent.
- Returns:
Dictionary with the animal presence mask for each bodypart and animal.
- Return type:
animal_presence_mask (table_dict)
- deepof.utils.iterative_imputation(project: deepof_project, tab_dict: dict, lik_dict: dict, full_imputation: bool = False)
Perform iterative imputation on occluded body parts. Run per animal and experiment.
- Parameters:
project (project) – Project object.
tab_dict (dict) – Dictionary with the coordinates of the body parts.
lik_dict (dict) – Dictionary with the likelihood of the tracking for each body part and animal.
full_imputation (bool) – Determines if only small gaps get linearily imputed (False) or additionally IterativeImputer and a few other steps are executed to close all gaps (True)
- Returns:
Dictionary with the coordinates of the body parts after imputation.
- Return type:
tab_dict (dict)
- deepof.utils.set_missing_animals(coordinates: deepof_project, tab_dict: dict, lik_dict: dict, animal_ids: list | None = None)
Set the coordinates of the missing animals to NaN.
- Parameters:
coordinates (project) – Project object.
tab_dict (dict) – Dictionary with the coordinates of the body parts.
lik_dict (dict) – Dictionary with the likelihood of the tracking for each body part and animal.
animal_ids (list) – List with the animal ids to remove. If None, all the animals with missing data are processed.
- Returns:
Dictionary with the coordinates of the body parts after removing missing animals.
- Return type:
tab_dict (dict)
- deepof.utils.bp2polar(tab: DataFrame) DataFrame
Return the DataFrame in polar coordinates.
- Parameters:
tab (pandas.DataFrame) – Table with cartesian coordinates.
- Returns:
Equivalent to input, but with values in polar coordinates.
- Return type:
polar (pandas.DataFrame)
- deepof.utils.tab2polar(cartesian_df: DataFrame) DataFrame
Return a pandas.DataFrame in which all the coordinates are polar.
- Parameters:
cartesian_df (pandas.DataFrame) – DataFrame containing tables with cartesian coordinates.
- Returns:
Equivalent to input, but with values in polar coordinates.
- Return type:
result (pandas.DataFrame)
- deepof.utils.compute_dist(pair_array: array) DataFrame
Return a pandas.DataFrame with the scaled distances between a pair of body parts.
- Parameters:
pair_array (numpy.array) – np.array of shape N * 4 containing X, y positions over time for a given pair of body parts.
- Returns:
pandas.DataFrame with the absolute distances between a pair of body parts.
- Return type:
result (pd.DataFrame)
- deepof.utils.bpart_distance(dataframe: DataFrame) DataFrame
Return a pandas.DataFrame with the scaled distances between all pairs of body parts.
- Parameters:
dataframe (pandas.DataFrame) – pd.DataFrame of shape N*(2*bp) containing X,y positions over time for a given set of bp body parts.
- Returns:
pandas.DataFrame with the absolute distances between all pairs of body parts.
- Return type:
result (pd.DataFrame)
- deepof.utils.angle(bpart_array: array) array
Return a numpy.ndarray with the angles between the provided instances.
- Parameters:
bpart_array (numpy.array) – 2D positions over time for a bodypart.
- Returns:
1D angles between the three-point-instances.
- Return type:
ang (np.array)
- deepof.utils.signed_angle(bpart_array: array) array
Return a numpy.ndarray with the signed angles between the provided instances.
- Parameters:
bpart_array (numpy.array) – 2D positions over time for a bodypart.
- Returns:
1D angles between the three-point-instances.
- Return type:
ang (np.array)
- deepof.utils.compute_areas(polygon_xy_stack: array) array
Compute polygon areas for the provided stack of sets of data point-xy coordinates.
- Parameters:
polygon_xy_stack – 3D numpy array [NPolygons (i.e. NFrames), Npoints, NDim (x,y)]
- Returns:
areas for the provided xy coordinates.
- Return type:
areas (np.ndarray)
- deepof.utils.compute_areas_numba(polygon_xy_stack: array) array
Compute polygon areas for the provided stack of sets of data point-xy coordinates.
- Parameters:
polygon_xy_stack (np.ndarray) – 3D numpy array [NPolygons (i.e. NFrames), Npoints, NDim (x,y)]
- Returns:
areas for the provided xy coordinates.
- Return type:
areas (np.ndarray)
- deepof.utils.polygon_area_numba(vertices: ndarray) float
Calculate the area of a single polygon given its vertices.
- Parameters:
vertices (np.ndarray) – Array of shape [Npoints, 2] containing the (x, y) coordinates of the polygon’s vertices.
- Returns:
Area of the polygon.
- Return type:
float
- deepof.utils.extend_behaviors_numba(behaviors: ndarray, delta_T: float = 2.0, frame_rate: float = 1) ndarray
Takes a booelan array of behavior detections and extends each behavior detection by delta_T.
- Parameters:
behaviors (np.ndarray) – Boolean array of shape [N_behaviors, N_frames] containing the detection results (True / False) of each behavior for each frame.
delta_T – Time by which each behavior should be expanded
frame_rate (float) – Frame rate of the corresponding project
- Returns:
Boolean array of shape [N_behaviors, N_frames] containing the detection results (True / False) of each behavior for each frame after extension.
- Return type:
extended_behaviors (np.ndarray)
- deepof.utils.count_transitions(tab_dict: deepof_table_dict, exp_conditions: dict, bin_info: dict | None = None, animals_in_roi: list | None = None, delta_T: float = 0.5, frame_rate: float = 1, silence_diagonal: bool = False, aggregate: str = True, normalize: str = True, diagonal_behavior_counting: str = 'Transitions')
Count transitions between successive behaviors for all experiments in tab_dict.
- Parameters:
tab_dict (table_dict) – Dictionary with behavior data (supervised or unsupervised soft_counts)
exp_conditions (dict) – Dictionary containg the experiment conditions for each experiment.
bin_info (dict) – dictionary containing indices to plot for all experiments
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
delta_T – Time after the offset of one behavior during which the onset of the next behavior counts as a transition
frame_rate (float) – Frame rate of the corresponding project
silence_diagonal (bool) – If True, diagonals are set to zero.
aggregate (bool) – If True, sums matrices per experimental condition; else per experiment.
normalize (bool) – Row-normalizes transition probabilities if True. Default=True.
diagonal_behavior_counting (str) – How to count diagonals (self-transitions). Options: - “Frames”: Total frames where behavior is active (after extension) - “Time”: Total time where behavior is active - “Events”: number of instances of the behavior occuring - “Transitions”: number of frame-wise internal behavior transitions e.g. A behavior of 4 frames in length would have 3 transitions.
- Returns:
- Dictionary of transition matrices. Keys:
If aggregate=True: Condition labels (e.g., {‘control’: array(…)})
If aggregate=False: Experiment IDs (e.g., {‘exp1’: array(…)})
columns (list): Behavior names (columns after dropping non-binary features). combined_columns (list): All possible behavior transition pairs (e.g., [‘BehaviorA-x-BehaviorB’, …]).
- Return type:
transitions_dict (dict)
- deepof.utils.count_events(binary_behavior: ndarray, counting_mode: str = 'Events', frame_rate: int = 1) int
Counts the number of continuous blocks of 1s in a binary behavior vector in different ways
- Parameters:
binary_behavior (numpy.ndarray) – Binary 1D Array containing behavior detections.
counting_mode (str) – Counting mode. Options are:
"Frames" (-) – Counts total number of frames in all events
"Time" (-) – Counts total time duration of all events (requires frame_rare input)
"Events" (-) – Counts number of continuous blocks of 1s
"Transitions" (-) – Counts number of frame-to-frame transitions within the events e.g. an event of 10 frames in length would have 9 transitions.
frame_rate (float) – Frame rate of the recording.
- Returns:
counted events
- Return type:
num_events (float)
- deepof.utils.rotate(p: array, angles: array, origin: array = array([0, 0])) array
Return a 2D numpy.ndarray with the initial values rotated by angles radians.
- Parameters:
p (numpy.ndarray) – 2D Array containing positions of bodyparts over time.
angles (numpy.ndarray) – Set of angles (in radians) to rotate p with.
origin (numpy.ndarray) – Rotation axis (zero vector by default).
- Returns:
rotated positions over time
- Return type:
rotated (numpy.ndarray)
- deepof.utils.rotate_all_numba(data: array, angles: array) array
Rotates Return a 2D numpy.ndarray with the initial values rotated by angles radians.
- Parameters:
p (numpy.ndarray) – 2D Array containing positions of bodyparts over time.
angles (numpy.ndarray) – Set of angles (in radians) to rotate p with.
- Returns:
rotated positions over time
- Return type:
rotated (numpy.ndarray)
- deepof.utils.rotate_numba(p: array, angles: array, origin: array = array([0, 0])) array
Return a 2D numpy.ndarray with the initial values rotated by angles radians.
- Parameters:
p (numpy.ndarray) – 2D Array containing positions of bodyparts over time.
angles (numpy.ndarray) – Set of angles (in radians) to rotate p with.
origin (numpy.ndarray) – Rotation axis (zero vector by default).
- Returns:
rotated positions over time
- Return type:
rotated (numpy.ndarray)
- deepof.utils.point_in_polygon(points: array, polygon: Polygon) array
Check if a set of points is inside a polygon.
- Parameters:
points (np.ndarray) – An array of shape (M, 2) containing the coordinates of the points.
polygon (shapely.geometry.polygon.Polygon) – Shapely polygon.
- Returns:
A boolean array of shape (M,) indicating whether each point is inside the polygon.
- Return type:
np.ndarray
- deepof.utils.point_in_polygon_numba(points: array, polygon: array) array
This function was generated by Perplexity.ai Check if a set of points is inside a polygon.
- Parameters:
points (np.ndarray) – An array of shape (M, 2) containing the coordinates of the points.
polygon (np.ndarray) – An array of shape (N, 2) containing the coordinates of the polygon vertices.
- Returns:
A boolean array of shape (M,) indicating whether each point is inside the polygon.
- Return type:
np.ndarray
- deepof.utils.get_point_polygon_distance(points: ndarray, polygon: Polygon) ndarray
Calculates array of distances between 2D points and a polygon (roi)
- deepof.utils.get_point_polygon_distance_numba(points, poly_xy)
- deepof.utils.in_field_of_view(mouse_pts: ndarray, fov_angle_deg: float, roi: Polygon, plot: bool = True, eps: float = 1e-10) ndarray
mouse_pts: (N, 3, 2) or (3, 2), order [left_ear, nose, right_ear] Returns float array of shape (N,):
1.0 -> ROI intersects FOV 0.0 -> ROI does not intersect FOV np.nan -> cannot be calculated (invalid/degenerate geometry or non-finite points)
Apex of FOV triangle is midpoint between ears.
- deepof.utils.in_field_of_view_numba(mouse_pts, fov_angle_deg, roi_poly, eps=1e-10)
Numba version of in_field_of_view (no plotting, no shapely).
mouse_pts: (N,3,2) float64 roi_poly: (M,2) float64 (not closed) returns: (N,) float64 in {1.0, 0.0, nan}
- deepof.utils.mouse_in_roi(tab, aid, in_roi_criterion, roi_polygon, run_numba=False)
Checks if a given animal for a given table is in a given roi by given criterion.
- Parameters:
tab (dataTable) – Datatable containing mouse tracking data.
aid (str) – ainimal id of the mouse to check
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
roi_polygon (np.ndarray) – 2D numpy array containing the coordinats of the ROI
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)
- Returns:
A boolean array indicating whether the mouse is inside the ROI.
- Return type:
mouse_in_polygon (np.ndarray)
- deepof.utils.align_trajectories(data: array, mode: str = 'all', run_numba: bool = False) array
Remove rotational variance on the trajectories.
Returns a numpy.array with the positions rotated in a way that the center (0 vector), and body part in the first column of data are aligned with the y-axis.
- Parameters:
data (numpy.ndarray) – 3D array containing positions of body parts over time, where shape is N (sliding window instances) * m (sliding window size) * l (features)
mode (string) – Specifies if all instances of each sliding window get aligned, or only the center
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)
- Returns:
2D aligned positions over time.
- Return type:
aligned_trajs (np.ndarray)
- deepof.utils.load_table(tab: str, table_path: str, table_format: str, rename_bodyparts_dict: dict | None = None, animal_ids: list | None = None)
Loads a table into a structured pandas data frame.
Supports inputs from both DeepLabCut and (S)LEAP.
- Parameters:
tab (str) – Name of the file containing the tracks.
table_path (string) – Full path to the file containing the tracks.
table_format (str) – type of the files to load, coming from either DeepLabCut (CSV and H5) and (S)LEAP (NPY).
rename_bodyparts_dict (dict) – dictionary of bodypart names given in the table corresponding to deepOFs bodypart names.
animal_ids (list) – List with the animal ids in case of multiple tracked animals. Is expected to be None if there is only a single animal getting tracked.
- Returns:
Data frame containing the loaded tracks. Likelihood for (S)LEAP files is imputed as 1.0 (tracked values) or 0.0 (missing values).
- Return type:
loaded_tab (pd.DataFrame)
- deepof.utils.rename_track_bps(loaded_tab: DataFrame, rename_bodyparts_dict: list, animal_ids: list)
Renames all body parts in the provided dataframe.
- Parameters:
loaded_tab (pd.DataFrame) – Data frame containing the loaded tracks. Likelihood for (S)LEAP files is imputed as 1.0 (tracked values) or 0.0 (missing values).
rename_bodyparts_dict (dict) – dictionary of bodypart names given in the table corresponding to deepOFs bodypart names.
animal_ids (list) – list of IDs to use for the animals present in the provided tracking files.
bodypart_graph (str) – DeepOF bodypart graph that is going to be used
- Returns:
Data frame with renamed body parts
- Return type:
renamed_tab (pd.DataFrame)
- deepof.utils.infer_scalar_cols(df: DataFrame)
- deepof.utils.infer_column_types(df)
Identify coord, speed, distance, and angle columns from a pose table.
- deepof.utils.scale_table(df: DataFrame, scale: str = 'standard', animal_ids=None, size_ref=('Nose', 'Tail_base'), inter_scale: str = 'mean', standardize: bool = True, dist_standardize: str = 'per_column', speed_standardize: str = 'per_column', log_distances: bool = True) DataFrame
- deepof.utils.kleinberg(offsets: list, s: float = 2.0, gamma: float = 1.0, n=None, T=None, k=None)
Apply Kleinberg’s algorithm (described in ‘Bursty and Hierarchical Structure in Streams’).
The algorithm models activity bursts in a time series as an infinite hidden Markov model.
Taken from pybursts (https://github.com/romain-fontugne/pybursts/blob/master/pybursts/pybursts.py) and adapted for dependency compatibility reasons.
- Parameters:
offsets (list) – a list of time offsets (numeric)
s (float) – the base of the exponential distribution that is used for modeling the event frequencies
gamma (float) – coefficient for the transition costs between states
n – used to adjust the fixed cost function (not dependent of the given offsets). Which is needed if you want to compare bursts for different inputs.
T – used to adjust the fixed cost function (not dependent of the given offsets). Which is needed if you want to compare bursts for different inputs.
k – maximum burst level
- deepof.utils.kleinberg_core_numba(gaps: array, s: float64, gamma: float64, n: int, T: float64, k: int) array
Computation intensive core part of Kleinberg’s algorithm (described in ‘Bursty and Hierarchical Structure in Streams’).
The algorithm models activity bursts in a time series as an infinite hidden Markov model.
Taken from pybursts (https://github.com/romain-fontugne/pybursts/blob/master/pybursts/pybursts.py) and rewritten for compatibility with numba.
- Args:
gaps (np.array): an array of gap sizes between time offsets (numeric) s (float): the base of the exponential distribution that is used for modeling the event frequencies gamma (float): coefficient for the transition costs between states n: used to adjust the fixed cost function (not dependent of the given offsets). Which is needed if you want to compare bursts for different inputs. T: used to adjust the fixed cost function (not dependent of the given offsets). Which is needed if you want to compare bursts for different inputs. k: maximum burst level / number of hidden states
:+
- deepof.utils.smooth_boolean_array(a: array, scale: int = 1, sigma=2.0, batch_size: int = 50000) array
LEGACY FILTER FOR BEHAVIORAL ANALYSIS. REPLACED BY multi_step_paired_smoothing Return a boolean array in which isolated appearances of a feature are smoothed.
- Args:
a (numpy.ndarray): Boolean instances. scale (int): Kleinberg scale parameter. Higher values result in stricter smoothing. batch_size (int): Batch size for input processing
- Returns:
a (numpy.ndarray): Smoothed boolean instances.
- deepof.utils.multi_step_paired_smoothing(behavior_in: array, not_behavior: array | None = None, exclude: array | None = None, min_length: int = 6, get_both: bool = False) array
This filtering approach will first gradually merge together very close behavioral instances (how close is regulated by min_length), then filter out remaining short instances. In this way multiple instances close to each other are kept and united and isolated very short bursts are filtered out. It replaces the kleinberg filtering approach with a similar idea as kleinberg was too susceptible to merge events together that were relatively distant on the time scale.
- Args:
behavior_in (numpy.ndarray): Boolean instances of detected raw behavior. not_behavior (numpy.ndarray): Boolean instances of raw behavior not occuring. exclude (numpy.ndarray): Additional boolean instances that will always be rated as “no behavior”. min_length (int): Determines the degree of smoothing. The smaller, the more short behavioral instances are kept and the sharper the behavioral edges remain. get_both (bool): If True, will also return the not_behavior instances that get smoothed along with the behavior instances.
- Returns:
behavior (numpy.ndarray): Smoothened boolean instances. not_behavior (numpy.ndarray): Smoothened boolean not-behavior instances.
- deepof.utils.rolling_window(a: ndarray, window_size: int, window_step: int) ndarray
Return a 3D numpy.array with a sliding-window extra dimension.
- Parameters:
a (np.ndarray) – N (instances) * m (features) shape
window_size (int) – Size of the window to apply
window_step (int) – Step of the window to apply
- Returns:
N (sliding window instances) * l (sliding window size) * m (features)
- Return type:
rolled_a (np.ndarray)
- deepof.utils.extract_windows(to_window: deepof_table_dict, window_size: int, window_step: int, save_as_paths: bool = False, shuffle: bool = False, aggregate: str | None = None, windows_desc: str = 'Get windows') ndarray
Apply the rupture method independently to each experiment, and concatenate into a single dataset at the end.
Returns a dataset and the rupture indices, adapted to be used in a concatenated version of the labels.
- Parameters:
to_window (table_dict) – table_dict with all experiments.
window_size (int) – specifies the length of the sliding window.
window_step (int) – specifies the stride of the sliding window.
save_as_paths (bool) – save result as paths in dictionary instead of keeping it in RAM
shuffle (bool) – Whether to shuffle the data for each dataset. Defaults to False.
aggregate (str) – Aggregate Instead of extracting full windows. Extracts full windows if none (default), otherwise options are: “mean” : average windows to one value “mid” : take middle of windows as window value “wta” : winner takes all: whatever behavior or behavior combination is the most frequent is set as teh window value “lta” : loser takes all: whatever behavior or behavior combination is the rarest is set as teh window value
windows_desc (str) – Progress bar label
- Returns:
Dictionary containing stacks of windowed data samples for each table. Shape of the stacks: [N_samples, window_size, N_features] output_shape (Tuple): shape of the output array (N_samples, window_size, N_features).
- Return type:
to_window (dict)
- deepof.utils.smooth_mult_trajectory(series: array, alpha: int = 0, w_length: int = 15) ndarray
Return a smoothed a trajectory using a Savitzky-Golay 1D filter.
- Parameters:
series (numpy.ndarray) – 1D trajectory array with N (instances)
alpha (int) – 0 <= alpha < w_length; indicates the difference between the degree of the polynomial and the window length for the Savitzky-Golay filter used for smoothing. Higher values produce a worse fit, hence more smoothing.
w_length (int) – Length of the sliding window to which the filter fit. Higher values yield a coarser fit, hence more smoothing.
- Returns:
smoothed version of the input, with equal shape
- Return type:
smoothed_series (np.ndarray)
- deepof.utils.moving_average(time_series: Series, lag: int = 5) Series
Fast implementation of a moving average function.
- Parameters:
time_series (pd.Series) – Uni-variate time series to take the moving average of.
lag (int) – size of the convolution window used to compute the moving average.
- Returns:
Uni-variate moving average over time_series.
- Return type:
moving_avg (pd.Series)
- deepof.utils.binary_moving_median_numba(time_series, lag)
will applay a moving median like filter on a binary signal, i.e. if a window of size lag has more 1s than 0s set the frame to 1 for that window, set it to 0 otherwise. Will only work for windows of uneven length N i.e. returns the same for lag=N and lag=N+1
- deepof.utils.mask_outliers(time_series: DataFrame, likelihood: DataFrame, likelihood_tolerance: float, lag: int, n_std: int, mode: str) DataFrame
Return a mask over the bivariate trajectory of a body part, identifying as True all detected outliers.
An outlier can be marked with one of two criteria: 1) the likelihood reported by DLC is below likelihood_tolerance, and/or 2) the deviation from a moving average model is greater than n_std.
- Parameters:
time_series (pd.DataFrame) – Bi-variate time series representing the x, y positions of a single body part
likelihood (pd.DataFrame) – Data frame with likelihood data per body part as extracted from deeplabcut
likelihood_tolerance (float) – Minimum tolerated likelihood, below which an outlier is called
lag (int) – Size of the convolution window used to compute the moving average
n_std (int) – Number of standard deviations over the moving average to be considered an outlier
mode (str) – If “and” (default) both x and y have to be marked in order to call an outlier. If “or”, one is enough.
- Returns
mask (pd.DataFrame): Bi-variate mask over time_series. True indicates an outlier.
- deepof.utils.full_outlier_mask(experiment: DataFrame, likelihood: DataFrame, likelihood_tolerance: float, exclude: str, lag: int, n_std: int, mode: str) DataFrame
Iterate over all body parts of experiment, and outputs a dataframe where all x, y positions are replaced by a boolean mask, where True indicates an outlier.
- Parameters:
experiment (pd.DataFrame) – Data frame with time series representing the x, y positions of every body part
likelihood (pd.DataFrame) – Data frame with likelihood data per body part as extracted from deeplabcut
likelihood_tolerance (float) – Minimum tolerated likelihood, below which an outlier is called
exclude (str) – Body part to exclude from the analysis (to concatenate with bpart alignment)
lag (int) – Size of the convolution window used to compute the moving average
n_std (int) – Number of standard deviations over the moving average to be considered an outlier
mode (str) – If “and” (default) both x and y have to be marked in order to call an outlier. If “or”, one is enough.
- Returns:
Mask over all body parts in experiment. True indicates an outlier
- Return type:
full_mask (pd.DataFrame)
- deepof.utils.remove_outliers(experiment: DataFrame, likelihood: DataFrame, likelihood_tolerance: float, exclude: str = '', lag: int = 5, n_std: int = 3, mode: str = 'or') DataFrame
Mark all outliers in experiment and replaces them using a uni-variate linear interpolation approach.
Note that this approach only works for equally spaced data (constant camera acquisition rates).
- Parameters:
experiment (pd.DataFrame) – Data frame with time series representing the x, y positions of every body part.
likelihood (pd.DataFrame) – Data frame with likelihood data per body part as extracted from deeplabcut.
likelihood_tolerance (float) – Minimum tolerated likelihood, below which an outlier is called.
exclude (str) – Body part to exclude from the analysis (to concatenate with bpart alignment).
lag (int) – Size of the convolution window used to compute the moving average.
n_std (int) – Number of standard deviations over the moving average to be considered an outlier.
mode (str) – If “and” both x and y have to be marked in order to call an outlier. If “or” (default), one is enough.
- Returns:
Interpolated version of experiment.
- Return type:
interpolated_exp (pd.DataFrame)
- deepof.utils.filter_animal_id_in_table(table: DataFrame, selected_id: str | None = None, table_type: str | None = None)
Filter a DataFrame to keep only those columns related to the selected id.
Leave labels untouched if present.
- Parameters:
table (pd.DataFrame) – a dataFrame to be filtered
selected_id (str) – select a single animal on multi animal settings. Defaults to None (all animals are processed).
table_type (str) – type of the tableDict
- Returns:
Filtered dataFrame, keeping only the selected animal.
- Return type:
pd.DataFrame
- deepof.utils.filter_columns(columns: list, selected_id: str, table_type: str | None = None) list
Given a set of TableDict columns, returns those that correspond to a given animal, specified in selected_id.
- Parameters:
columns (list) – List of columns to filter.
selected_id (str) – Animal ID to filter for.
table_type (str) – Type of the table (relevant if “supervised”)
- Returns:
List of filtered columns.
- Return type:
filtered_columns (list)
- deepof.utils.load_precompiled_model(path, download_path, model_path, model_name)
Loads model for automatic arena segmentation
- deepof.utils.rolling_speed(dframe: DatetimeIndex, frame_rate: int = 1, window: int = 3, rounds: int = 3, deriv: int = 1, shift: int = 2, typ: str = 'coords') DataFrame
Return the average speed over n frames in millimeters per second.
- Parameters:
dframe (pandas.DataFrame) – Position over time dataframe.
frame_rate (int) – Number of frames per second.
window (int) – Number of frames to average over.
rounds (int) – Float rounding decimals.
deriv (int) – Position derivative order; 1 for speed, 2 for acceleration, 3 for jerk, etc.
shift (int) – Window shift for rolling speed calculation.
typ (str) – Type of dataset. Intended for internal usage only.
- Returns:
Data frame containing 2D speeds for each body part in the original data or their consequent derivatives.
- Return type:
speeds (pd.DataFrame)
- deepof.utils.get_behavior_mask_and_confidence(tab: DataFrame, behaviors: List[str], supervised_export: bool) Tuple[DataFrame, DataFrame]
Generates a boolean mask and a confidence dataframe for given behaviors.
- Parameters:
tab (Union[pd.DataFrame]) – Table with supervised or unsupervised behaviors, converted to a data frame.
behaviors (List(str)) – List of behavior names.
supervised_export (bool) – Does the given table contain supervised or unsupervised behaviors?
- Returns:
Mask of confidence indices to keep.
- Return type:
np.ndarray
- deepof.utils.row_nanargmax(arr)
argmax per row, ignoring NaNs. Returns NaN for all-NaN rows.
- deepof.utils.filter_short_bouts(cluster_assignments: ndarray, cluster_confidence: ndarray, confidence_indices: ndarray, min_confidence: float = 0.0, min_bout_duration: int | None = None)
Filter out cluster assignment bouts shorter than min_bout_duration.
- Parameters:
cluster_assignments (np.ndarray) – Array of cluster assignments.
cluster_confidence (np.ndarray) – Array of cluster confidence values.
confidence_indices (np.ndarray) – Array of confidence indices.
min_confidence (float) – Minimum confidence value.
min_bout_duration (int) – Minimum bout duration in frames.
- Returns:
Mask of confidence indices to keep.
- Return type:
np.ndarray
- deepof.utils.filter_short_true_segments(array: ndarray, min_length: int)
Filters out sahort “True” sections from boolean array “array”
- Parameters:
array (np.ndarray) – Boolean array
min_length (int) – Minimum length of “true” sections within array.
- Returns:
Mask of confidence indices to keep.
- Return type:
np.ndarray
- deepof.utils.filter_short_true_segments_numba(array: ndarray, min_length: int)
Filters out sahort “True” sections from boolean array “array”
- Parameters:
array (np.ndarray) – Boolean array
min_length (int) – Minimum length of “true” sections within array.
- Returns:
Mask of confidence indices to keep.
- Return type:
np.ndarray
- deepof.utils.gmm_compute(x: array, n_components: int, cv_type: str) list
Fit a Gaussian Mixture Model to the provided data and returns evaluation metrics.
- Parameters:
x (numpy.ndarray) – Data matrix to train the model
n_components (int) – Number of Gaussian components to use
cv_type (str) – Covariance matrix type to use. Must be one of “spherical”, “tied”, “diag”, “full”.
- Returns:
model and associated BIC for downstream selection.
- Return type:
gmm_eval (list)
- deepof.utils.gmm_model_selection(x: DataFrame, n_components_range: range, part_size: int, n_runs: int = 100, n_cores: int = False, cv_types: Tuple = ('spherical', 'tied', 'diag', 'full')) Tuple[List[list], List[ndarray], int | Any]
Run GMM clustering model selection on the specified X dataframe.
Outputs the bic distribution per model, a vector with the median BICs and an object with the overall best model.
- Parameters:
x (pandas.DataFrame) – Data matrix to train the models
n_components_range (range) – Generator with numbers of components to evaluate
part_size (int) – Size of bootstrap samples for each model
n_runs (int) – Number of bootstraps for each model
n_cores (int) – Number of cores to use for computation
cv_types (tuple) – Covariance Matrices to try. All four available by default
- Returns:
All recorded BIC values for all attempted parameter combinations (useful for plotting). - m_bic(list): All minimum BIC values recorded throughout the process (useful for plottinh). - best_bic_gmm (sklearn.GMM): Unfitted version of the best found model.
- Return type:
bic (list)
- deepof.utils.cluster_transition_matrix(cluster_sequence: array, nclusts: int, autocorrelation: bool = True, return_graph: bool = False) Tuple[Graph | Any, ndarray]
Compute the transition matrix between clusters and the autocorrelation in the sequence.
- Parameters:
cluster_sequence (numpy.array) – Sequence of cluster assignments.
nclusts (int) – Number of clusters in the sequence.
autocorrelation (bool) – Whether to compute the autocorrelation of the sequence.
return_graph (bool) – Whether to return the transition matrix as an networkx.DiGraph object.
- Returns:
Transition matrix as numpy.ndarray or networkx.DiGraph. autocorr (numpy.array): If autocorrelation is True, returns a numpy.ndarray with all autocorrelation values on cluster assignment.
- Return type:
trans_normed (numpy.ndarray / networkx.Graph)
- deepof.utils.get_total_Frames(video_paths: dict) int
Get the number of all frames in all videos listed in the input dictionary
- Parameters:
video_paths (dict) – Paths to all videos in a dicitonary
- Returns:
Total number of all video frames
- Return type:
total_frames (int)
- deepof.utils.validate_parameter(param_name: str, param_value: Any, valid_options: List[Any], is_list: bool = False, custom_error_if_empty: str | None = None, only_one_of_many: bool | None = True, can_be_dict: bool | None = False)
A generic helper to validate a single parameter against a list of valid options.
- Parameters:
param_name (str) – The name of the parameter being checked (for error messages).
param_value (Any) – The value of the parameter provided by the user.
valid_options (List[Any]) – The list of allowed values.
is_list (bool) – If True, checks if param_value is a subset of valid_options. Otherwise, checks if it is a member of valid_options.
custom_error_if_empty (Optional[str]) – A specific error to raise if the parameter is provided but the list of valid options is empty.
only_one_of_many (Optional[bool]) – If only one of the valid options is allowed: True If a subset of the valid options is allowed: False
can_be_dict (Optional[bool]) – Parameter can also be given as a dict (e.g. allowed for experiment_id)
deepof.visuals module
General plotting functions for the deepof package.
- deepof.visuals.plot_heatmaps(coordinates: deepof_coordinates, bodyparts: list, center: str = 'arena', align: str | None = None, exp_condition: str | None = None, condition_value: str | None = None, experiment_id: int = 'average', bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: list | None = None, display_rois: bool = True, in_roi_criterion: str = 'Center', display_arena: bool = True, xlim: float | None = None, ylim: float | None = None, extrapolate_heatmap: bool = True, save: bool = False, dpi: int = 100, ax: Any | None = None, show: bool = True, **kwargs) figure
Plot heatmaps of the specified body parts (bodyparts) of the specified animal (i).
- Parameters:
coordinates (coordinates) – deepof Coordinates object.
bodyparts (list) – list of body parts to plot.
center (str) – Name of the body part to which the positions will be centered. If false, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch.
align (str) – Selects the body part to which later processes will align the frames with (see preprocess in table_dict documentation).
exp_condition (str) – Experimental condition to plot base filters on.
condition_value (str) – Experimental condition value to plot. If available, it filters the experiments to keep only those whose condition value matches the given string in the provided exp_condition.
experiment_id (str) – Name of the experiment to display. When given as “average” positiosn of all animals are averaged.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored. Note: providing precomputed bins with gaps will result in an incorrect time vector depiction.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
display_rois (bool) – Display the active ROI, if a ROI was selected. Defaults to True.
display_arena (bool) – whether to plot a dashed line with an overlying arena perimeter. Defaults to True.
xlim (float) – x-axis limits.
ylim (float) – y-axis limits.
extrapolate_heatmap (bool) – show full heatmap including extrapolated parts (default=True)
save (str) – if provided, the figure is saved to the specified path.
dpi (int) – resolution of the figure.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, a new figure will be created.
show (bool) – whether to show the created figure. If False, returns al axes.
kwargs – additional arguments to pass to the seaborn kdeplot function.
- Returns:
figure with the specified characteristics
- Return type:
heatmaps (plt.figure)
- deepof.visuals.plot_gantt(coordinates: deepof_project, instance_id: str, supervised_annotations: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_index: int | str | None = None, bin_size: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max=20000, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', additional_checkpoints: DataFrame | None = None, signal_overlay: Series | None = None, instances_to_plot: list | None = None, ax: Any | None = None, save: bool = False)
Return a scatter plot of the passed projection. Allows for temporal and quality filtering, animal aggregation, and changepoint detection size visualization.
- Parameters:
coordinates (project) – deepOF project where the data is stored.
instance_id (str) – Name of the instance to display (can either be an experiment or a behavior).
supervised_annotations (table_dict) – table dict with supervised annotations per video. new figure will be created.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
bin_size (Union[int,str]) – bin size for time filtering.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored. Note: providing precomputed bins with gaps will result in an incorrect time vector depiction.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI)
additional_checkpoints (pd.DataFrame) – table with additional checkpoints to plot.
signal_overlay (pd.Series) – overlays a continuous signal with all selected behaviors. None by default.
instances_to_plot (list) – list of either behaviors or experiments to plot. If instance_id is an experiment this needs to be a list of behaviors and vice versa. If None, all options are plotted.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.
- deepof.visuals.gantt_plotter(coordinates: deepof_project, gantt_matrix: ndarray, plot_type: str, instance_id: str, n_available_instances: int, instances_to_plot: list, colors: list, bin_indices: ndarray, additional_checkpoints: DataFrame | None = None, signal_overlay: Series | None = None, ax: Any | None = None, save: bool = False)
Return a scatter plot of the passed projection. Allows for temporal and quality filtering, animal aggregation, and changepoint detection size visualization.
- Parameters:
coordinates (project) – deepOF project where the data is stored.
gantt_matrix (np.ndarray) – 2D integer matrix denoting time sections with present or absent behavior
plot_type (str) – type of plot, either “supervised” or “unsupervised”
instance_id (str) – Name of the experiment or behavior to display.
n_available_instances (int) – number of all possibly available instances (may be behaviors or experiments)
instances_to_plot (list) – selected instances for plotting as a list (may be behaviors or experiments)
colors (list) – list of color hexcodes for plotting
bin_indices (np.ndarray) – indices to plot
additional_checkpoints (pd.DataFrame) – table with additional checkpoints to plot.
signal_overlay (pd.Series) – overlays a continuous signal with all selected behaviors. None by default.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.
- deepof.visuals.plot_enrichment(coordinates: deepof_coordinates, embeddings: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, supervised_annotations: deepof_table_dict | None = None, bin_index: int | str | None = None, bin_size: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max: int = 100000, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', polar_depiction: bool = False, plot_speed: bool = False, add_stats: str = 'Mann-Whitney', exp_condition: str | None = None, exp_condition_order: list | None = None, normalize: bool = False, verbose: bool = False, unit_time: str = 's', unit_distance: str = 'm', ax: Any | None = None, save: bool = False)
Violin plots per cluster per condition.
- Parameters:
coordinates (coordinates) – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
supervised_annotations (table_dict) – table dict with supervised annotations per animal experiment across time.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
bin_size (Union[int,str]) – bin size for time filtering.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
polar_depiction (bool) – if True, display as polar plot.
plot_speed (bool) – if supervised annotations are provided, display only speed. Useful to visualize speed.
add_stats (str) – test to use. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
exp_condition_order (list) – Order in which to plot experimental conditions. If None (default), the order is determined by the order of the keys in the table dict.
normalize (bool) – whether to represent time fractions or actual time in seconds on the y axis.
verbose (bool) – if True, prints test results and p-value cutoffs. False by default.
unit_time (str) – Time unit (frames, seconds, minutes, hours) to display the result in the given unit
unit_distance (str) – Distance unit (millimeters, centimeters, meters) to display the result in the given unit
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.
- deepof.visuals.return_transitions(coordinates: deepof_coordinates, supervised_annotations: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: list | None = None, exp_condition: str | None = None, delta_T: float = 0.0, silence_diagonal: bool = False, diagonal_behavior_counting: str = 'Transitions', normalize: bool = True, visualization='networks')
Returns data of plot_transitions with same Input options
- deepof.visuals.plot_transitions(coordinates: deepof_coordinates, supervised_annotations: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: list | None = None, exp_condition: str | None = None, delta_T: float = 0.0, silence_diagonal: bool = False, diagonal_behavior_counting: str = 'Transitions', normalize: bool = True, visualization='networks', ax: list | None = None, save: bool = False, **kwargs)
Compute and plots transition matrices for all data or per condition. Plots can be heatmaps or networks.
- Parameters:
coordinates (coordinates) – deepOF project where the data is stored.
supervised_annotations (table_dict) – table dict with supervised annotations.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
delta_T – Time after the offset of one behavior during which the onset of the next behavior counts as a transition
silence_diagonal (bool) – If True, diagonals are set to zero.
diagonal_behavior_counting (str) – How to count diagonals (self-transitions). Options: - “Frames”: Total frames where behavior is active (after extension) - “Time”: Total time where behavior is active - “Events”: number of instances of the behavior occuring - “Transitions”: number of frame-wise internal behavior transitions e.g. A behavior of 4 frames in length would have 3 transitions.
normalize (bool) – Row-normalizes transition probabilities if True. Default=True.
visualization (str) – visualization mode. Can be either ‘networks’, or ‘heatmaps’.
ax (list) – axes where to plot the current figure. If not provided, a new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.
kwargs – additional arguments to pass to the seaborn kdeplot function.
- deepof.visuals.count_all_events(coordinates: deepof_coordinates, supervised_annotations: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: list | None = None, in_roi_criterion: str = 'Center', counting_mode='Events')
Counts all events in supervised or soft_counts dataset and returns a data table.
- Parameters:
coordinates (coordinates) – deepOF project where the data is stored.
supervised_annotations (table_dict) – table dict with supervised annotations.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
counting_mode (str) – How to count behaviors. Options: - “Frames”: Total frames where behavior is active (after extension) - “Time”: Total time where behavior is active - “Events”: number of instances of the behavior occuring - “Transitions”: number of frame-wise internal behavior transitions e.g. A behavior of 4 frames in length would have 3 transitions.
- deepof.visuals.plot_stationary_entropy(coordinates: deepof_coordinates, embeddings: deepof_table_dict, soft_counts: deepof_table_dict, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max=20000, roi_number: int | None = None, animals_in_roi: list | None = None, in_roi_criterion: str = 'Center', add_stats: str = 'Mann-Whitney', exp_condition: str | None = None, verbose: bool = False, ax: Any | None = None, save: bool = False)
Compute and plots transition stationary distribution entropy per condition.
- Parameters:
coordinates (coordinates) – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
add_stats (str) – test to use. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
verbose (bool) – if True, prints test results and p-value cutoffs. False by default.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.
- deepof.visuals.plot_normative_log_likelihood(embeddings: deepof_table_dict, exp_condition: str, embedding_dataset: DataFrame, normative_model: str, ax: Any, add_stats: str, verbose: bool)
Plot a bar chart with normative log likelihoods per experimental condition, and compute statistics.
- Parameters:
embeddings (table_dict) – table dictionary containing supervised annotations or unsupervised embeddings per animal.
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
embedding_dataset (pd.DataFrame) – global animal embeddings, alongside their respective experimental conditions
normative_model (str) – Name of the cohort to use as controls. If provided, fits a Gaussian density to the control global animal embeddings, and reports the difference in likelihood across all instances of the provided experimental condition. Statistical parameters can be controlled via **kwargs (see full documentation for details).
ax (plt.AxesSubplot) – matplotlib axes where to render the plot
add_stats (str) – test to use. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
verbose (bool) – if True, prints test results and p-value cutoffs. False by default.
- Returns:
embedding data frame with added normative scores per sample
- Return type:
embedding_dataset (pd.DataFrame)
- deepof.visuals.plot_embeddings(coordinates: deepof_coordinates, embeddings: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, supervised_annotations: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max=20000, roi_number: int | None = None, animals_in_roi: str | list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', min_confidence: float = 0.0, normative_model: str | None = None, add_stats: str = 'Mann-Whitney', verbose: bool = False, exp_condition: str | None = None, aggregate_experiments: str | None = None, samples: int = 500, show_aggregated_density: bool = True, colour_by: str = 'cluster', ax: Any | None = None, save: bool = False)
Return a scatter plot of the passed projection. Allows for temporal and quality filtering, animal aggregation, and changepoint detection size visualization.
- Parameters:
coordinates (coordinates) – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
supervised_annotations (table_dict) – table dict with supervised annotations per experiment.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
min_confidence (float) – minimum confidence in cluster assignments used for quality control filtering.
normative_model (str) – Name of the cohort to use as controls. If provided, fits a Gaussian density to the control global animal embeddings, and reports the difference in likelihood across all instances of the provided experimental condition. Statistical parameters can be controlled via **kwargs (see full documentation for details).
add_stats (str) – test to use. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
verbose (bool) – if True, prints test results and p-value cutoffs. False by default.
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
aggregate_experiments (str) – Whether to aggregate embeddings by experiment (by time on cluster, mean, or median) or not (default).
samples (int) – Number of samples to take from the time embeddings. None leads to plotting all time-points, which may hurt performance.
show_aggregated_density (bool) – if True, a density plot is added to the aggregated embeddings.
colour_by (str) – hue by which to colour the embeddings. Can be one of ‘cluster’ (default), ‘exp_condition’, ‘exp_id’ or, if supervised behaviors are given, also any supervised behavior.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.
- deepof.visuals.animate_skeleton(coordinates: deepof_coordinates, experiment_id: str, embeddings: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: str | Sequence[str] | None = None, in_roi_criterion: str | Sequence[str] = 'Center', animal_id: str | Sequence[str] | None = None, center: str | bool = 'arena', align: str | None = None, sampling_rate: float | None = None, min_confidence: float = 0.0, min_bout_duration: int | None = None, selected_cluster: ndarray | None = None, display_arena: bool = True, legend: bool = True, save: bool | str | None = None, dpi: int = 100)
Render a FuncAnimation object with embeddings and/or motion trajectories over time.
- Parameters:
coordinates (coordinates) – deepof Coordinates object.
experiment_id (str) – Name of the experiment to display.
embeddings (table_dict) – UMAP or latent embedding for each experiment. If not None, a second animation shows the embedding, colored by cluster if available.
soft_counts (table_dict) – soft cluster assignments for all instances in data. If provided together with selected_cluster, only instances of the specified
bin_size (component are rendered. Defaults to None.) – bin size for time filtering.
bin_index (Union[int, str, None]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray, optional) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int, optional) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded).
animals_in_roi (str or list of str, optional) – IDs of animals that need to be inside the active ROI. All frames in which any of the given animals are not inside the ROI get excluded.
in_roi_criterion (str or list of str) – Criterion for in-roi check: a single bodypart, a list of bodyparts or “all” bodyparts of a mouse.
animal_id (str or list of str, optional) – ID list of animals to display. If None (default) it shows all animals.
center (str or bool) – Name of the body part to which the positions will be centered. If False, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch.
align (str, optional) – Body part to which later processes will align the frames.
sampling_rate (float, optional) – Sampling rate for the video. If None is given, the same one as in the video recordings will be used.
min_confidence (float) – Minimum confidence threshold to render a cluster assignment bout.
min_bout_duration (int, optional) – Minimum number of frames to render a cluster assignment bout.
selected_cluster (np.ndarray, optional) – Cluster to filter.
display_arena (bool) – Whether to plot a dashed line with an overlying arena perimeter.
legend (bool) – Whether to add a color-coded legend to multi-animal plots.
save (bool or str, optional) – If not None, save the animation. If a string is provided, it is added as a suffix in the auto-generated file name.
dpi (int) – Dots per inch of the figure to create.
- deepof.visuals.export_annotated_video(coordinates: deepof_coordinates, soft_counts: dict | None = None, supervised_annotations: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, frame_limit_per_video: int | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', behaviors: list | None = None, experiment_id: str | None = None, min_confidence: float = 0.75, min_bout_duration: int | None = None, display_time: bool = False, display_counter: bool = False, display_arena: bool = False, display_markers: bool = False, display_mouse_labels: bool = False, exp_conditions: dict = {}, cluster_names: str | None = None)
Export annotated videos from both supervised and unsupervised pipelines.
- Parameters:
coordinates (coordinates) – coordinates object for the current project. Used to get video paths.
soft_counts (dict) – dictionary with soft_counts per experiment.
supervised_annotations (table_dict) – table dict with supervised annotations per experiment.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
frame_limit_per_video (int) – number of frames to render per video. If None, all frames are included for all videos.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
behaviors (list) – Behaviors or Clusters to that get exported. If none is given, all are exported for softcounts and only nose2nose is exported for supervised annotations. If multiple behaviors are given as a list, one video can get annotated with multiple different behaviors
experiment_id (str) – if provided, data coming from a particular experiment is used. If not, all experiments are exported.
min_confidence (float) – minimum confidence threshold for a frame to be considered part of a cluster.
min_bout_duration (int) – Minimum number of frames to render a cluster assignment bout.
display_time (bool) – Displays current time in top left corner of the video frame
display_counter (bool) – Displays event counter for each displayed event.
display_arena (bool) – Displays arena for each video.
display_markers (bool) – Displays mouse body parts on top of the mice.
display_mouse_labels (bool) – Displays identities of the mice
exp_conditions (dict) – if provided, data coming from a particular condition is used. If not, all conditions are exported. If a dictionary with more than one entry is provided, the intersection of all conditions (i.e. male, stressed) is used.
cluster_names (dict) – dictionary with user-defined names for each cluster (useful to output interpretation).
- deepof.visuals.plot_distance_between_conditions(coordinates: deepof_coordinates, embedding: dict, soft_counts: dict, exp_condition: str, embedding_aggregation_method: str = 'median', distance_metric: str = 'wasserstein', n_jobs: int = -1, save: bool = False, ax: Any | None = None)
Plot the distance between conditions across a growing time window.
Finds an optimal separation binning based on the distance between conditions, and plots it across all non-overlapping bins. Useful, for example, to measure habituation over time.
- Parameters:
coordinates (coordinates) – coordinates object for the current project. Used to get video paths.
embedding (dict) – embedding object for the current project. Used to get video paths.
soft_counts (dict) – dictionary with soft_counts per experiment.
exp_condition (str) – experimental condition to use for the distance calculation.
embedding_aggregation_method (str) – method to use for aggregating the embedding. Options are ‘time_on_cluster’ and ‘mean’.
distance_metric (str) – distance metric to use for the distance calculation. Options are ‘wasserstein’ and ‘euclidean’.
n_jobs (int) – number of jobs to use for the distance calculation.
save (bool) – if True, saves the figure to the project directory.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
- deepof.visuals.return_mouse_roi_interaction(coordinates: deepof_coordinates, bodyparts: list | None = None, animal_id: str | None = None, N_time_bins: int = 24, custom_time_bins: List[List[int | str]] | None = None, samples_max=20000, roi_number: int | None = None, hide_time_bins: list[bool] | None = None, experiment_ids: list | None = None, exp_condition: str | None = None, condition_values: str | List[str] | None = None, mode: str = 'distance', add_stats: str = 'Mann-Whitney', error_bars: str = 'sem', unit_distance: str = 'm', fov_angle_deg: int = 90, get_raw_data: bool = False)
Return binned statistics and effect sizes for mouse-ROI interaction over time.
Computes either the distance of selected bodyparts to a ROI/arena boundary or the fraction of time a ROI/arena falls within a mouse’s field of view, aggregated into time bins. When
get_raw_data=Truethe raw per-frame interaction values are returned instead.- Parameters:
coordinates (coordinates) – deepOF project containing the stored data.
bodyparts (list) – List of bodyparts whose distance to the ROI/arena is measured. Used in “distance” mode.
animal_id (str) – ID of the animal to use. Used in “fov” mode to construct the required bodypart triplet (Left_ear, Nose, Right_ear).
N_time_bins (int) – Number of time bins for data separation. Defaults to 24.
custom_time_bins (List[List[Union[int, str]]]) – Custom time bins array consisting of pairs of start- and stop positions given as integers or time strings. Overrides N_time_bins if provided.
samples_max (int) – Maximum number of samples taken per bin to avoid excessive computation times. Defaults to 20000.
roi_number (int) – Number of the ROI to measure interaction with. If None, the arena boundary is used.
hide_time_bins (list[bool]) – List of booleans denoting which bins should be visible (False) or hidden (True). Defaults to displaying all time bins.
experiment_ids (list) – List of experiment IDs to include. If None, all experiments are used. Ignored when a valid exp_condition/condition_values combination is provided.
exp_condition (str) – Experimental condition to compare.
condition_values (str) – Condition values to compare. If a string is provided it is wrapped in a list.
mode (str) – Interaction measure to compute. Must be one of “distance” (bodypart-ROI distance) or “fov” (field-of-view overlap). Defaults to “distance”.
add_stats (str) – Statistical test to use for pairwise comparisons. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
error_bars (str) – Type of error bars to compute (either standard deviation (“std”) or standard error (“sem”)). Defaults to standard error.
unit_distance (str) – Distance unit (m, cm, mm, …) used when mode is “distance”.
fov_angle_deg (int) – Angle of the field of view of teh mouse, defaults to 90 deg.
get_raw_data (bool) – If True, returns the raw per-frame interaction DataFrame instead of binned statistics. Defaults to False.
- Returns:
a DataFrame with raw per-frame interaction values per experiment. Otherwise: a tuple of (binned_effect_sizes_df, binned_group_df) containing binned statistics, means, error values and effect sizes.
- Return type:
If
get_raw_data=True
- deepof.visuals.plot_mouse_roi_interaction(coordinates: deepof_coordinates, bodyparts: list | None = None, animal_id: str | None = None, N_time_bins: int = 24, custom_time_bins: List[List[int | str]] | None = None, samples_max=20000, roi_number: int | None = None, hide_time_bins: list[bool] | None = None, experiment_ids: list | None = None, exp_condition: str | None = None, condition_values: str | List[str] | None = None, mode: str = 'distance', add_stats: str = 'Mann-Whitney', error_bars: str = 'sem', unit_distance: str = 'm', fov_angle_deg: int = 90, ax: Any | None = None, polar_depiction: bool = False, show_histogram: bool = True)
Plot mouse-ROI interaction over time as a polar plot or line chart with optional effect-size histogram.
Visualises either the distance of selected bodyparts to a ROI/arena boundary or the fraction of time a ROI/arena falls within a mouse’s field of view, aggregated into time bins. Supports statistical annotations and effect-size overlays when exactly two experimental conditions are compared.
- Parameters:
coordinates (coordinates) – deepOF project containing the stored data.
bodyparts (list) – List of bodyparts whose distance to the ROI/arena is measured. Used in “distance” mode.
animal_id (str) – ID of the animal to use. Used in “fov” mode to construct the required bodypart triplet (Left_ear, Nose, Right_ear).
N_time_bins (int) – Number of time bins for data separation. Defaults to 24.
custom_time_bins (List[List[Union[int, str]]]) – Custom time bins array consisting of pairs of start- and stop positions given as integers or time strings. Overrides N_time_bins if provided.
samples_max (int) – Maximum number of samples taken per bin to avoid excessive computation times. Defaults to 20000.
roi_number (int) – Number of the ROI to measure interaction with. If None, the arena boundary is used.
hide_time_bins (list[bool]) – List of booleans denoting which bins should be visible (False) or hidden (True). Defaults to displaying all time bins.
experiment_ids (list) – List of experiment IDs to include. If None, all experiments are used. Ignored when a valid exp_condition/condition_values combination is provided.
exp_condition (str) – Experimental condition to compare.
condition_values (str) – Condition values to compare. If a string is provided it is wrapped in a list.
mode (str) – Interaction measure to compute. Must be one of “distance” (bodypart-ROI distance) or “fov” (field-of-view overlap). Defaults to “distance”.
add_stats (str) – Statistical test to use for pairwise comparisons. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
error_bars (str) – Type of error bars to display (either standard deviation (“std”) or standard error (“sem”)). Defaults to standard error.
unit_distance (str) – Distance unit (m, cm, mm, …) used when mode is “distance”.
fov_angle_deg (int) – Angle of the field of view of teh mouse, defaults to 90 deg.
ax (Any) – Matplotlib axis for plotting. If None, creates a new figure.
polar_depiction (bool) – If True, display as polar plot. Defaults to False.
show_histogram (bool) – If True, displays histogram with rough effect size estimations. Defaults to False.
- Returns:
The Matplotlib axis containing the plot.
- Return type:
ax
- deepof.visuals.get_roi_data(coordinates: deepof_coordinates, table_dict: deepof_table_dict, roi_number: int, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', bin_index: int | str | None = None, bin_size: int | str | None = None, precomputed_bins: ndarray | None = None, samples_max: int = 100000, experiment_id: str | None = None)
get data in Rois.
- Parameters:
coordinates (coordinates) – deepOF project where the data is stored.
table_dict (table_dict) – table dict with information for ROi extraction. Can be supervised or unsupervised data.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
bin_size (Union[int,str]) – bin size for time filtering.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
experiment_id (str) – Name of the experiment id to extract. If None (default) a dictionary of all entries will be exported.
- deepof.visuals.return_supervised_summary(coordinates: deepof_coordinates, supervised_annotations: deepof_table_dict, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', N_time_bins: int = 10, custom_time_bins: List[List[int | str]] | None = None, hide_time_bins: List[bool] | None = None, samples_max=20000, unit_time: str = 's', unit_distance: str = 'm', save_table=True)
Returns summary of supervised information
Args: N_time_bins (int): Number of time bins for data separation. Defaults to 24. custom_time_bins (List[List[Union[int,str]]]): Custom time bins array consisting of pairs of start- and stop positions given as integers or time strings. Overrides N_time_bins if provided. unit_time (str): Time unit (frames, seconds, minutes, hours) to display the result in the given unit unit_distance (str): Distance unit (millimeters, centimeters, meters) to display the result in the given unit
deepof.visuals_utils module
Plotting utility functions for the deepof package.
- deepof.visuals_utils.time_to_seconds(time_string: str) float
Compute seconds as float based on a time string.
- Parameters:
time_string (str) – time string as input (format HH:MM:SS or HH:MM:SS.SSS…).
- Returns:
time in seconds
- Return type:
seconds (float)
- deepof.visuals_utils.seconds_to_time(seconds: float, cut_milliseconds: bool = True) str
Compute a time string based on seconds as float.
- Parameters:
seconds (float) – time in seconds
cut_milliseconds (bool) – decides if milliseconds should be part of the output, defaults to True
- Returns:
time string (format HH:MM:SS or HH:MM:SS.SSS…)
- Return type:
time_string (str)
- deepof.visuals_utils.hex_to_BGR(hex_color)
- deepof.visuals_utils.BGR_to_hex(bgr_color)
- deepof.visuals_utils.RGB_to_hex(bgr_color)
- deepof.visuals_utils.get_behavior_colors(behaviors: list, animal_ids: list | DataFrame | None = None)
Gets corresponding colors for all supervised behaviors or clusters within behaviors list.
- Parameters:
behaviors (list) – List of strings containing behaviors
Union[list (animal_ids) – Either list of strings representing animal ids or supervised dataframe from which said list can be automatically extracted.
pd.DataFrame] – Either list of strings representing animal ids or supervised dataframe from which said list can be automatically extracted.
- Returns:
A list of strings that contain hex color codes for each behavior. Will return None and display a warning for unknown behaviors.
- Return type:
list
- deepof.visuals_utils.generate_behavior_combinations(animal_ids, symmetric_behaviors=True, asymmetric_behaviors=True, single_behaviors=True, continuous_behaviors=True)
Generates combinations of animal IDs with different types of behaviors exactly as in supervised annotations.
- Parameters:
animal_ids (list) – List of strings representing animal IDs.
symmetric_behaviors (list) – List of symmetric paired behaviors.
asymmetric_behaviors (list) – List of asymmetric paired behaviors.
single_behaviors (list) – List of single mouse behaviors.
- Returns:
A list of strings with the combined animal IDs and behaviors.
- Return type:
list
- deepof.visuals_utils.calculate_average_arena(all_vertices: dict[List[Tuple[float, float]]], num_points: int = 10000) array
Calculates the average arena based on a list of polynomial vertices lists representing arenas. Polynomial vertices can have different lengths and start at different positions
- Parameters:
all_vertices (dict[List[Tuple[float, float]]]) – A dictionary of lists of 2D tuples representing the vertices of the arenas.
num_points (int) – number of points in the averaged arena.
- Returns:
A 2D NumPy array containing the averaged arena.
- Return type:
numpy.ndarray
- deepof.visuals_utils.create_bin_pairs(L_array: int, N_time_bins: int)
Creates a List of bin_index and bin_size pairs when splitting a list in N_time_bins
- Parameters:
L_array (int) – Length of the array to index.
N_time_bins (int) – number of time bins to create.
- Returns:
A 2D list containing start and end positions of each bin.
- Return type:
bin_pairs (list(tuple))
- deepof.visuals_utils.validate_custom_bins(coordinates, N_time_bins, L_shortest, custom_time_bins=None, hide_time_bins=None, min_bins_required=4)
- deepof.visuals_utils.postprocess_df_bins(df: DataFrame, bin_lengths, hide_time_bins)
- deepof.visuals_utils.cohend(array_a: array, array_b: array)
calculate Cohen’s d effect size. Does not assume equal population standard deviations, and can still be used for unequal sample sizes
- Parameters:
array_a (np.array) – First array of values to compare.
array_b (np.array) – Second array of values to compare.
- Return type:
Cohens d (int)
Cohen’s d can be used to calculate the standardized difference between two categories, e.g. difference between means The value of Cohen’s d varies from 0 to infinity. Sign indicates directionality? show both hypothesis test (likelihood of observing the data given an assumption (null hypothesis) w p-value) and effect size (quantify the size of the effect assuming that the effect is present) Cohen’s d measures the difference between the mean from two Gaussian-distributed variables. It is a standard score that summarizes the difference in terms of the number of standard deviations. Because the score is standardized, there is a table for the interpretation of the result, summarized as:
Small Effect Size: d=0.20 Medium Effect Size: d=0.50 Large Effect Size: d=0.80.
- deepof.visuals_utils.cohend_effect_size(d: float)
categorizes Cohen’s d effect size.
- Parameters:
d (float) – Cohens d
- Returns:
Categorized effect size
- Return type:
int
- deepof.visuals_utils.get_supervised_behaviors_in_roi(cur_supervised: DataFrame, local_bin_info: dict, animal_ids: str | list, roi_mode: str = 'mousewise')
Filter supervised behaviors based on rois given by animal_ids.
- Parameters:
cur_supervised (pd.DataFrame) – data frame with supervised behaviors.
local_bin_info (dict) – bin_info dictionary for one experiment, containing field “time” with array of included frames and fields “animal_id” with boolean arrays that denote which mace were within the selcted roi for these frames
animal_ids (Union[str, list]) – single or multiple animal ids
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
- Returns:
data frame with supervised behaviors with detections outside of the ROI set to NaN
- Return type:
cur_supervised (pd.DataFrame)
- deepof.visuals_utils.get_unsupervised_behaviors_in_roi(cur_unsupervised: array, local_bin_info: dict, animal_ids: str)
Filter unsupervised behaviors based on rois given by animal_ids.
- Parameters:
cur_unsupervised (np.array) – 1D or 2D array with unsupervised behaviors (can be soft or hard counts).
local_bin_info (dict) – bin_info dictionary for one experiment, containing field “time” with array of included frames and fields “animal_id” with boolean arrays that denote which mace were within the selcted roi for these frames
animal_ids (Union[str, list]) – single or multiple animal ids
- Returns:
1D or 2D array with unsupervised behaviors with detections outside of the ROI set to NaN (2D) or -1 (1D)
- Return type:
cur_unsupervised (np.array)
- deepof.visuals_utils.get_behavior_frames_in_roi(behavior: str, local_bin_info: dict, animal_ids: str | list)
Filter unsupervised behaviors based on rois given by animal_ids.
- Parameters:
behavior (str) – Behavior for which frames in ROi get determined.
local_bin_info (dict) – bin_info dictionary for one experiment, containing field “time” with array of included frames and fields “animal_id” with boolean arrays that denote which mace were within the selcted roi for these frames
animal_ids (Union[str, list]) – single or multiple animal ids
- Returns:
1D array containing all frames for which the animal is (animals are) within the ROI
- Return type:
frames (np.array)
- deepof.visuals_utils.calculate_FSTTC(preceding_behavior: Series, proximate_behavior: Series, frame_rate: float, delta_T: float = 2.0)
Calculates the association measure FSTTC between two behaviors given as boolean series
- deepof.visuals_utils.calculate_simple_association(preceding_behavior: ndarray, proximate_behavior: ndarray, frame_rate: float, min_T: float = 10.0)
Calculates Yule’s coefficient Q between two behaviors given as boolean arrays
- deepof.visuals_utils.contiguous_segments(mask: ndarray)
- deepof.visuals_utils.scale_units(coordinates, key, data, unit: str, target_distance: str | None = None, target_time: str | None = None)
Scale data from unit to requested target units and return (scaled, new_unit). unit can be “<u>” or “<u_num>/<u_den>”, where each u is in TimeUnit or DistanceUnit.
- deepof.visuals_utils.get_square_shape_for_gridlike_plot(N)
get best number of rows and columns for grid like plots
- deepof.visuals_utils.plot_arena(coordinates: deepof_coordinates, center: str, color: str, ax: Any, key: str, roi_number: int | None = None)
Plot the arena in the given canvas.
- Parameters:
coordinates (coordinates) – deepof Coordinates object.
center (str) – Name of the body part to which the positions will be centered. If false, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch.
color (str) – color of the displayed arena.
ax (Any) – axes where to plot the arena.
str (key) – key of the animal to plot with optional “all of them” (if key==”average”).
int (roi_number) – number of a roi, if given
- deepof.visuals_utils.heatmap(dframe: DataFrame, bodyparts: List, xlim: tuple | None = None, ylim: tuple | None = None, title: str | None = None, mask: ndarray | None = None, extrapolate_heatmap: bool = True, save: str = False, dpi: int = 200, ax: Any | None = None, **kwargs) figure
Return a heatmap of the movement of a specific bodypart in the arena.
If more than one bodypart is passed, it returns one subplot for each.
- Parameters:
dframe (pandas.DataFrame) – table_dict value with info to plot bodyparts (List): bodyparts to represent (at least 1).
bodyparts (list) – list of body parts to plot.
xlim (float) – limits of the x-axis.
ylim (float) – limits of the y-axis.
title (str) – title of the figure.
mask (np.ndarray) – mask to apply to the heatmap across time.
extrapolate_heatmap (bool) – Show full heatmap including extrapolated parts (default = True)
save (str) – if provided, saves the figure to the specified file.
dpi (int) – dots per inch of the figure to create.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
kwargs – additional arguments to pass to the seaborn kdeplot function.
- Returns:
figure with the specified characteristics
- Return type:
heatmaps (plt.figure)
- deepof.visuals_utils.process_df(df: DataFrame, error_bars: str = 'sem')
Process binned behavioral DF independent of number of exp conditions.
- Returns:
mean_values (dict[str, np.ndarray]) – Mapping condition -> array of mean values per time_bin (sorted by time_bin).
error_values (dict[str, np.ndarray]) – Mapping condition -> array of error values per time_bin (sorted by time_bin).
binned_effect_sizes_df (pd.DataFrame) – Pairwise effect sizes (Cohen’s d) for all condition pairs per time_bin. Columns: [“time_bin”,”cond_a”,”cond_b”,”Absolute_Cohens_d”,”Effect_Size_Category”] Empty if <2 conditions.
time_bins (np.ndarray) – Sorted unique time_bin values used for the arrays.
conditions (list[str]) – Sorted unique exp_condition values (keys of the dicts).
- deepof.visuals_utils.plot_binned_line(ax, x, y, yerr=None, hide_time_bins=None, color='C0', label=None, smooth_points_per_interval: int = 10, mean_linewidth: float = 3.0, mean_alpha: float = 0.8, err_linewidth: float = 1.0, err_alpha: float = 0.15, marker: str = 'o')
Plot a binned mean line with interpolation + markers + error band, leaving gaps for hidden bins and NaNs.
- Parameters:
ax (matplotlib axis)
x (array-like, shape (n_bins,)) – X positions (must be strictly increasing).
y (array-like, shape (n_bins,)) – Mean values per bin.
yerr (array-like or None, shape (n_bins,)) – Error values per bin (sem/std). If None, no error band is drawn.
hide_time_bins (array-like of bool or None, shape (n_bins,)) – True bins will be hidden (gaps in line, no marker/error there).
color (str)
label (str)
smooth_points_per_interval (int) – Number of points per bin-to-bin interval for mean interpolation (>=2).
- deepof.visuals_utils.ensure_axis(ax=None, polar_depiction=False, figsize=(12, 4))
If ax is None: create proper axis and return (fig, ax, show=True) If ax is given:
if polar_depiction=True and ax is not polar, convert it in-place
return (ax.figure, ax, show=False)
- deepof.visuals_utils.get_binned_geometry(bin_lengths)
Returns a dict with centers/widths/edges in radians (0..2π) + labels “1..N”.
- deepof.visuals_utils.format_time_binned_axis(ax, geom, polar_depiction, max_value, title=None, xlabel=None, ylabel=None)
- deepof.visuals_utils.add_polar_bin_labels(ax, geom, radius_factor=1.05)
Call after histogram so rmax is final.
- deepof.visuals_utils.plot_binned_groups(ax, x_radians, mean_values, error_values, condition_values, hide_time_bins, colors, plot_binned_line_func)
Plots mean +/- error for each condition using your existing plot_binned_line. Returns (handles, max_value).
- deepof.visuals_utils.plot_effectsize_histogram(ax, geom, effect_size_categories, hide_time_bins, max_value, bottom, show_histogram=True, cmap=('#9370DB', '#6A5ACD', '#4B0082'), hidden_color='#C0C0C0', alpha=0.8)
Draws effect size histogram bars. Returns (legend_handles, stat_text_col).
- deepof.visuals_utils.annotate_binwise_stats(ax, test_dict, geom, polar_depiction, text_color='k')
- deepof.visuals_utils.add_binned_legends(ax, condition_handles, condition_labels, effect_handles=None, polar_depiction=False, show_histogram=True, first_plot=True)
Adds condition legend + effect-size legend with consistent placement. Only adds legends if first_plot=True.