deepof package

Submodules

deepof.annotation_utils module

Functions and general utilities for supervised pose estimation. See documentation for details.

class deepof.annotation_utils.Behavior_scope(value)

Bases: Enum

An enumeration.

INDIVIDUAL = 1

PAIR_DIRECTIONAL = 2

PAIR_NONDIRECTIONAL = 3

class deepof.annotation_utils.Behavior_output(value)

Bases: Enum

An enumeration.

BINARY = 1

CONTINUOUS = 2

class deepof.annotation_utils.BehaviorContext(key: str, animal_ids: list[str], frame_rate: float, arena_type: Any, arena_params: Any, roi_dict: dict, raw_coords: pandas.core.frame.DataFrame, coords: pandas.core.frame.DataFrame, dists: pandas.core.frame.DataFrame, angles: pandas.core.frame.DataFrame, speeds: pandas.core.frame.DataFrame, likelihoods: pandas.core.frame.DataFrame, full_features: Any, params: dict[str, typing.Any], run_numba: bool = False, extra: dict[str, typing.Any] = <factory>)

Bases: object

key: str

animal_ids: list[str]

frame_rate: float

arena_type: Any

arena_params: Any

roi_dict: dict

raw_coords: DataFrame

coords: DataFrame

dists: DataFrame

angles: DataFrame

speeds: DataFrame

likelihoods: DataFrame

full_features: Any

params: dict[str, Any]

run_numba: bool = False

extra: dict[str, Any]

prefix(animal_id: str) → str: For multi-animal, columns are e.g. ‘A_Nose’, ‘B_Nose’ etc. for single-animal, just ‘Nose’.

bp(animal_id: str, bodypart: str) → str: Convenience: ctx.bp(“A”,”Nose”)->”A_Nose”; ctx.bp(“”,”Nose”)->”Nose”.

__init__(key: str, animal_ids: list[str], frame_rate: float, arena_type: ~typing.Any, arena_params: ~typing.Any, roi_dict: dict, raw_coords: ~pandas.core.frame.DataFrame, coords: ~pandas.core.frame.DataFrame, dists: ~pandas.core.frame.DataFrame, angles: ~pandas.core.frame.DataFrame, speeds: ~pandas.core.frame.DataFrame, likelihoods: ~pandas.core.frame.DataFrame, full_features: ~typing.Any, params: dict[str, ~typing.Any], run_numba: bool = False, extra: dict[str, ~typing.Any] = <factory>) → None

deepof.annotation_utils.postprocess_median_filtering(y: ndarray, ctx: BehaviorContext, behavior_output: Behavior_output) → ndarray: Default postprocessing for most binary behaviors.

deepof.annotation_utils.postprocess_following(y: ndarray, ctx: BehaviorContext, animal_ids: str | Tuple[str, str] | None) → ndarray: Standard postprocessing, then removal of short segments

deepof.annotation_utils.postprocess_identity(y: ndarray, ctx: BehaviorContext, animal_ids: str | Tuple[str, str] | None) → ndarray: Does not apply any postprocessing, used for e.g. continuous behaviors

Bases: object

Class for different types of behaviors that The supervised annotations of DeepOF can process.

name: str

scope: Behavior_scope

output_type: Behavior_output

compute: Callable[[BehaviorContext, str | Tuple[str, str] | None], ndarray | Series | Mapping[str, ndarray | Series]]

unit: str | None = 'a.u.'

color: str | None = None

postprocess: Callable[[ndarray, BehaviorContext, str | Tuple[str, str] | None], ndarray] | None = None

requires: Tuple[str, ...] = ()

order: int = 0

set_color(color: str | None) → DeepOF_behavior

column_name(ctx: BehaviorContext, animal_ids: str | Tuple[str, str] | None) → str

annotate_behavior(ctx: BehaviorContext, animal_ids: str | Tuple[str, str] | None) → ndarray

__init__(name: str, scope: Behavior_scope, output_type: Behavior_output, compute: Callable[[BehaviorContext, str | Tuple[str, str] | None], ndarray | Series | Mapping[str, ndarray | Series]], unit: str | None = 'a.u.', color: str | None = None, postprocess: Callable[[ndarray, BehaviorContext, str | Tuple[str, str] | None], ndarray] | None = None, requires: Tuple[str, ...] = (), order: int = 0) → None

deepof.annotation_utils.compute_nose2nose(ctx: BehaviorContext, mice_pair: str | Tuple[str, str] | None) → ndarray: nondirectional, noses of both mice are close

deepof.annotation_utils.compute_sidebyside(ctx: BehaviorContext, mice_pair: str | Tuple[str, str] | None) → ndarray: nondirectional, mice are next to each other nose by nose

deepof.annotation_utils.compute_sidereside(ctx: BehaviorContext, mice_pair: str | Tuple[str, str] | None) → ndarray: nondirectional, mice are next to each other nose by tail

deepof.annotation_utils.compute_nose2tail(ctx: BehaviorContext, mice_pair: str | Tuple[str, str] | None) → ndarray: Directional: (a,b) means a_nose close to b_tailbase

deepof.annotation_utils.compute_nose2body(ctx: BehaviorContext, mice_pair: str | Tuple[str, str] | None) → ndarray: Directional: (a,b) means a_nose close to any of b main_body parts.

deepof.annotation_utils.compute_following(ctx: BehaviorContext, mice_pair: str | Tuple[str, str] | None) → ndarray: Directional: (a,b) means a follows b.

deepof.annotation_utils.compute_climb_arena(ctx: BehaviorContext, animal_id: str | Tuple[str, str] | None) → ndarray

deepof.annotation_utils.compute_sniff_arena(ctx: BehaviorContext, animal_id: str | Tuple[str, str] | None) → ndarray

deepof.annotation_utils.compute_immobility(ctx: BehaviorContext, animal_id: str | Tuple[str, str] | None) → ndarray

deepof.annotation_utils.compute_stat_lookaround(ctx: BehaviorContext, animal_id: str | Tuple[str, str] | None) → ndarray

deepof.annotation_utils.compute_detect_activity(ctx: BehaviorContext, animal_id: str | Tuple[str, str] | None) → dict[str, ndarray]

deepof.annotation_utils.compute_sniffing(ctx: BehaviorContext, animal_id: str | Tuple[str, str] | None) → ndarray

deepof.annotation_utils.compute_rearing(ctx: BehaviorContext, animal_id: str | Tuple[str, str] | None) → ndarray

deepof.annotation_utils.close_single_contact(pos_dframe: DataFrame, left: str, right: str, tol: float) → array

Return a boolean array that’s True if the specified body parts are closer than tol.

Parameters:

pos_dframe (pandas.DataFrame) – DLC output as pandas.DataFrame; only applicable to two-animal experiments.
left (string) – First member of the potential contact
right (string) – Second member of the potential contact
tol (float) – maximum distance for which a contact is reported

Returns:

True if the distance between the two specified points is less than tol, False otherwise

Return type:

contact_array (np.array)

deepof.annotation_utils.close_double_contact(pos_dframe: DataFrame, left1: str, left2: str, right1: str, right2: str, rel_tol: float, rev: bool = False) → array

Return a boolean array that’s True if the specified body parts are closer than tol.

Parameters:

pos_dframe (pandas.DataFrame) – DLC output as pandas.DataFrame; only applicable to two-animal experiments.
#left_len (float) – Length of animal 1
left1 (string) – First contact point of animal 1
left2 (string) – Second contact point of animal 1
#right_len (float) – Length of animal 2
right1 (string) – First contact point of animal 2
right2 (string) – Second contact point of animal 2
rel_tol (float) – relative shar which affects the maximum distance for which a contact is reported
rev (bool) – reverses the default behaviour (nose2tail contact for both mice)

Returns:

True if the distance between the two specified points is less than tol, False otherwise

Return type:

double_contact (np.array)

deepof.annotation_utils.rotate(origin, point, ang)

Auxiliar function to climb_wall and sniff_object. Rotates x,y coordinates over a pivot.

Parameters:

() (ang)
()
()

Returns:

qy ():

Return type:

qx ()

deepof.annotation_utils.outside_ellipse(x, y, e_center, e_axes, e_angle, threshold=0.0)

Auxiliar function to climb_wall and sniff_object.

Returns True if the passed x, y coordinates are outside the ellipse denoted by e_center, e_axes and e_angle, with a certain threshold

deepof.annotation_utils.climb_arena(arena_type: str, arena: array, pos_dict: DataFrame, rel_tol: float, id: str, mouse_len: 50, centered_data: bool = False, run_numba: bool = False) → array

Return True if the specified mouse is climbing the wall.

Parameters:

arena_type (str) – arena type; must be one of [‘polygonal-manual’, ‘circular-autodetect’]
arena (np.array) – contains arena location and shape details
pos_dict (table_dict) – position over time for all videos in a project
rel_tol (float) – relative tolerance (to mouse length) to report a hit
id (str) – indicates the id + subcondition of the animal
centered_data (bool) – indicates whether the input data is centered
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)

Returns:

boolean array. True if selected animal is climbing the walls of the arena

Return type:

climbing (np.array)

deepof.annotation_utils.sniff_object(speed_dframe: DataFrame, arena: array, pos_dict: DataFrame, tol: float, tol_speed: float, nose: str, center_name: str = 'Center', centered_data: bool = False, s_object: str = 'arena', animal_id: str = '', run_numba: bool = False)

Return True if the specified mouse is sniffing an object.

Parameters:

speed_dframe (pandas.DataFrame) – speed of body parts over time.
arena (np.array) – contains arena location and shape details.
pos_dict (table_dict) – position over time for all videos in a project.
tol (float) – minimum tolerance to report a hit.
tol_speed (float) – minimum speed to report a hit.
center_name (str) – Body part to center coordinates on. “Center” by default.
nose (str) – indicates the name of the body part representing the nose of the selected animal.
centered_data (bool) – indicates whether the input data is centered.
s_object (str) – indicates the object to sniff. Must be one of [‘arena’, ‘object’].
animal_id (str) – indicates the animal to sniff. Must be one of animal_ids.
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)

Returns:

boolean array. True if selected animal is sniffing the selected object

Return type:

sniffing (np.array)

deepof.annotation_utils.immobility(X_huddle: ndarray, huddle_estimator: Pipeline, animal_id: str = '', median_filter_width: int = 11, min_immobility: int = 25, max_immobility: int = 3000) → array

Return true when the mouse is huddling a pretrained model.

Parameters:

X_huddle (pandas.DataFrame) – mouse features over time.
huddle_estimator (sklearn.pipeline.Pipeline) – pre-trained model to predict feature occurrence.
animal_id (str) – indicates the animal to sniff. Must be one of animal_ids.
median_filter_width (int) – width of median filter for smoothing results
min_immobility (int) – minimum length of behavior to be considered immobility
max_immobility (int) – maximum length of behavior to be considered immobility (longer is labeled as “sleeping”)

Returns:

1 if the animal is huddling, 0 otherwise

Return type:

y_huddle (np.array)

deepof.annotation_utils.augment_with_neighbors(X_huddle, window=5, step=1, window_out=11)

Expands a given set of features with leading and lagging features on the time axis. Will only return speed based features.

Parameters:

X_huddle (pandas.DataFrame) – mouse features over time.
window (int) – steps to go forward and backward in time for each feature
step (int) – step size for the window
window_out (int) – total length of the output window

Returns:

mouse features over time including leading and lagging features (only speed features) for each frame

Return type:

X_augmented (pandas.DataFrame)

deepof.annotation_utils.digging(speed_dframe: DataFrame, dist_dframe: DataFrame, likelihood_dframe: DataFrame, mouse_identity: str, close_range: ndarray, tol_speed: float, tol_likelihood: float, min_length: int, center_name: str = 'Center', animal_id: str = '')

Return true when the mouse is digging. Experimental and currently not included.

Parameters:

speed_dframe (pandas.DataFrame) – speed of body parts over time
dist_dframe (pandas.DataFrame) – distance between body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
mouse_identity (str) – animal id without the _
close_range (np.ndarray) – boolean array that denotes if the nose of the current mouse is close to any other mouse for each frame.
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
min_length (int) – minimum length that True segments need to have to not get filtered out.
center_name (str) – Body part to center coordinates on. “Center” by default.
animal_id (str) – ID of the current animal.

Returns:

True if the animal is standing still and is active, False otherwise stationary_passive (np.array): True if the animal is standing still and is passive, False otherwise

Return type:

stationary_active (np.array)

deepof.annotation_utils.stationary_lookaround(speed_dframe: DataFrame, dist_dframe: DataFrame, likelihood_dframe: DataFrame, mouse_identity: str, close_range: ndarray, tol_speed: float, tol_likelihood: float, min_length: int, animal_id: str = '')

Return true when the mouse is standing still and looking around (moving nose without head being tilted too much).

Design considerations:: Detecting immobility and activity is relatively straightforward by mostly just checking speed thresholds on bodyparts. The main problem arises from getting a lot of “flickering” out of the detections, as bodyparts from frame to frame may be just above or below that threshold. Respectively most of the detect_activity algorithm is a series of filtering steps to alternatingly smooth the predictions and sharpening the edges of predicted behavior.

Parameters:

speed_dframe (pandas.DataFrame) – speed of body parts over time
dist_dframe (pandas.DataFrame) – distance between body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
mouse_identity (str) – animal id without the _
close_range (np.ndarray) – boolean array that denotes if the nose of the current mouse is close to any other mouse for each frame.
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
min_length (int) – minimum length that True segments need to have to not get filtered out.
animal_id (str) – ID of the current animal.

Returns:

True if the animal is standing still and looking around (moving nose without head being tilted too much), False otherwise

Return type:

stationary_lookaround (np.array)

deepof.annotation_utils.detect_activity(speed_dframe: DataFrame, likelihood_dframe: DataFrame, tol_speed: float, tol_likelihood: float, min_length: int, center_name: str = 'Center', animal_id: str = '')

Return true when the mouse is either moving (moving), standing still and either moving (active) or not moving (passive).

Design considerations:: Detecting immobility and activity is relatively straightforward by mostly just checking speed thresholds on bodyparts. The main problem arises from getting a lot of “flickering” out of the detections, as bodyparts from frame to frame may be just above or below that threshold. Respectively most of the detect_activity algorithm is a series of filtering steps to alternatingly smooth the predictions and sharpening the edges of predicted behavior.

Parameters:

speed_dframe (pandas.DataFrame) – speed of body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
min_length (int) – minimum length that True segments need to have to not get filtered out.
center_name (str) – Body part to center coordinates on. “Center” by default.
animal_id (str) – ID of the current animal.

Returns:

True if the animal is standing still and is active, False otherwise stationary_passive (np.array): True if the animal is standing still and is passive, False otherwise mobile (np.array): True if the animal is not standing still, False otherwise

Return type:

stationary_active (np.array)

deepof.annotation_utils.sniff_around(speed_dframe: DataFrame, likelihood_dframe: DataFrame, tol_speed: float, tol_likelihood: float, center_name: str = 'Center', animal_id: str = '')

Return true when the mouse is sniffing around using simple rules.

Parameters:

speed_dframe (pandas.DataFrame) – speed of body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
center_name (str) – Body part to center coordinates on. “Center” by default.
animal_id (str) – ID of the current animal.

Returns:

True if the animal is standing still and sniffing around, False otherwise

Return type:

lookaround (np.array)

deepof.annotation_utils.rearing(pos_dframe: DataFrame, speed_dframe: DataFrame, likelihood_dframe: DataFrame, rearing_tol: float, tol_likelihood: float, tol_speed: float, animal_id: str = '')

Return true when the mouse is sniffing around using simple rules.

Parameters:

speed_dframe (pandas.DataFrame) – speed of body parts over time
likelihood_dframe (pandas.DataFrame) – likelihood of body part tracker over time, as directly obtained from DeepLabCut
tol_speed (float) – Maximum tolerated speed for the center of the mouse
tol_likelihood (float) – Maximum tolerated likelihood for the nose.
center_name (str) – Body part to center coordinates on. “Center” by default.
animal_id (str) – ID of the current animal.

Returns:

True if the animal is standing still and sniffing around, False otherwise

Return type:

lookaround (np.array)

deepof.annotation_utils.following_path(distance_dframe: DataFrame, position_dframe: DataFrame, speed_dframe: DataFrame, follower: str, followed: str, frames: int = 20, tol: float = 0, tol_speed: float = 0) → array

Return True if ‘follower’ is closer than tol to the path that followed has walked over the last specified number of frames.

For multi animal videos only.

Args:
distance_dframe (pandas.DataFrame): distances between bodyparts; generated by the preprocess module position_dframe (pandas.DataFrame): position of bodyparts; generated by the preprocess module speed_dframe (pandas.DataFrame): speed of body parts over time follower (str) identifier for the animal who’s following followed (str) identifier for the animal who’s followed frames (int) frames in which to track whether the process consistently occurs, tol (float) Maximum distance for which True is returned tol_speed (float): Minimum speed for the following mouse

Returns:
follow (np.array): boolean sequence, True if conditions are fulfilled, False otherwise

deepof.annotation_utils.max_behaviour(behaviour_dframe: DataFrame, window_size: int = 10, stepped: bool = False) → array

Return the most frequent behaviour in a window of window_size frames.

Parameters:

behaviour_dframe (pd.DataFrame) – boolean matrix containing occurrence of tagged behaviours per frame in the video
window_size (int) – size of the window to use when computing the maximum behaviour per time slot
stepped (bool) – sliding windows don’t overlap if True. False by default

Returns:

string array with the most common behaviour per instance of the sliding window

Return type:

max_array (np.array)

deepof.annotation_utils.frame_corners(w, h, corners: dict = {})

Return a dictionary with the corner positions of the video frame.

Parameters:

w (int) – width of the frame in pixels
h (int) – height of the frame in pixels
corners (dict) – dictionary containing corners to overwrite

Returns:

dictionary with overwriten parameters. Those not specified in the input retain their default values

Return type:

defaults (dict)

deepof.annotation_utils.supervised_tagging(coord_object: deepof_coordinates, raw_coords: deepof_table_dict, coords: deepof_table_dict, dists: deepof_table_dict, angles: deepof_table_dict, speeds: deepof_table_dict, full_features: dict, key: str, immobility_estimator: str | None = None, center: str = 'Center', params: dict = {}, run_numba: bool = False, custom_behaviors: list[DeepOF_behavior] | None = None, custom_behavior_context: dict = {}) → DataFrame

Output a dataframe with the registered motives per frame.

If specified, produces a labeled video displaying the information in real time

Parameters:

coord_object (deepof.data.coordinates) – coordinates object containing the project information
raw_coords (deepof.data.table_dict) – table_dict with raw coordinates
coords (deepof.data.table_dict) – table_dict with already processed (centered and aligned) coordinates
dists (deepof.data.table_dict) – table_dict with already processed distances
angles (deepof.data.table_dict) – table_dict with already processed angles
speeds (deepof.data.table_dict) – table_dict with already processed speeds
full_features (dict) – A dictionary of aligned kinematics, where the keys are the names of the experimental conditions. The values are the aligned kinematics for each condition.
key (str) – key to the experiment to tag and current set of objects (videos, tables, distances etc.)
immobility_estimator (str) – classifier to determine if a mouse is immobile or not.
center (str) – Body part to center coordinates on. “Center” by default.
params (dict) – dictionary to overwrite the default values of the parameters of the functions that the rule-based pose estimation utilizes. See documentation for details.
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)
custom_behaviors (list[DeepOF_behavior]) – a list of custom DeepOF_behavior objects. Added at the beginning of supervised behaviors if provided
custom_behavior_context (dict) – a dictionary containing additional information you need for your custom behaviors

Returns:

table with traits as columns and frames as rows. Each value is a boolean indicating trait detection at a given time

Return type:

tag_df (pandas.DataFrame)

deepof.annotation_utils.calculate_close_range(df: DataFrame, mouse_id: str, bodypart: str, threshold: float)

Detects for a given set of mouse coordinates if the selected bodypart of the selected mouse is close to any bodypart of any other mouse for each frame.

Parameters:

df (pd.DataFrame) – Dataframe containing coordinates of multiple mice
mouse_id (str) – Id of the target mouse
bodypart (str) – Bodypart of the target mouse that should be used for distance calculation
threshold (float) – Maximum distance that triggers “closeness”

Returns:

Boolean numpy array set to True for each frame in which the lected bodypart of the selected mosue was closer than threshold to any other mouse, False otherwise.

Return type:

proximity_mask (np.array)

deepof.annotation_utils.validate_custom_behaviors(custom_behaviors: list[DeepOF_behavior] | None = None, custom_behavior_inputs: dict = {})

deepof.annotation_utils.assign_custom_behavior_colors(custom_behaviors: list[DeepOF_behavior] | None = None): Returns a list of hex colors (same order as custom_behaviors), uses user defined colors if available

deepof.arena_utils module

Functions and general utilities for the deepof package.

class deepof.arena_utils.Arena_GUI_exit_flag(value)

Bases: Enum

An enumeration.

UNKNOWN = 1

PREVIOUS = 2

NEXT = 3

PROPAGATE = 4

UNOPENED = 5

deepof.arena_utils.get_arenas(coordinates: deepof_project | deepof_coordinates, arena: str, arena_dims: int, number_of_rois: int, segmentation_model_path: str, video_path: str, videos: list | None = None, test: bool = False, roi_dicts: dict | None = None, arena_params: dict | None = None, edit_tag: str = '', scales: dict | None = None)

Extract arena parameters from a project or coordinates object.

Parameters:

coordinates (coordinates) – Coordinates object.
tables (table_dict) – TableDict object containing tracklets per animal.
arena (str) – Arena type (must be either “polygonal-manual”, “circular-manual”, “polygonal-autodetect”, or “circular-autodetect”).
arena_dims (int) – Arena dimensions.
number_of_rois (int) – number of behavior rois,
segmentation_model_path (str) – Path to segmentation model used for automatic arena detection.
video_path (str) – Path to folder with videos.
videos (dict) – Dictionary of videos to extract arena parameters from. Defaults to None (all videos are used).
debug (bool) – If True, a frame per video with the detected arena is saved. Defaults to False.
edit_tag (str) – optional affix for arena or roi name to prevent overwriting of existing images.
test (bool) – If True, the function is run in test mode. This means that instead of waiting for user-inputs fixed artifical user-inputs are used. Defaults to False.

Returns:

Dictionary of scaling information. Each scales object consists of:

x position of the center of arena in mm
y position of the center of the arena in mm
diameter of the arena (when circular) or length of first edge in pixels
diameter of the arena (when circular) or length of first edge in mm

arena_params (dict): Dictionary of arena parameters. Each arena parameter object consists of:

(when circular) - x position of the center of arena in pixel - y position of the center of the arena in pixel - x axis radii of the arena in pixel - y axis radii of the arena in pixel - angle of the elipse (when polygonal) - x and y positions of the polygon vertices in pixel

video_resolution (dict): Dictionary of video resolutions. Each video resolution object consists of:

height of the video in pixel
width of the video in pixel

Return type:

scales (dict)

deepof.arena_utils.simplify_polygon(polygon: list, n_points: int | None = None, relative_tolerance: float = 0.05, preserve_topology: bool = False)

Simplify a polygon using shapely’s RDP simplify, and if n_points is given, return exactly n_points denoised vertices whose sides are aligned with the dominant sides of the (noisy) input polygon.

Strategy for n_points:

Pick exactly n_points “corners” (fixed-number Douglas–Peucker style).
For each side between corners, fit a best-fit line (TLS/PCA) to its points.
Export vertex points for best line segment fits.

deepof.arena_utils.closest_side(polygon: list, reference_side: list)

Find the closest side in other polygons to a reference side in the first polygon.

Parameters:

polygon (list) – List of polygons.
reference_side (list) – List of coordinates of the reference side.

Returns:

List of coordinates of the closest side.

Return type:

closest_side_points (list)

deepof.arena_utils.automatically_recognize_arena(videos: dict, vid_key: str, path: str = '.', arena_type: str = 'circular-autodetect', arena_reference: list | None = None, segmentation_model: Module | None = None, num_sample_frames: int = 100) → Tuple[array, int, int]

Return numpy.ndarray with information about the arena recognised from the first frames of the video.

WARNING: estimates won’t be reliable if the camera moves along the video.

Parameters:

coordinates (coordinates) – Coordinates object.
videos (list) – Relative paths of the videos to analise.
vid_key (str) – key of video to use.
path (str) – Full path of the directory where the videos are.
arena_type (string) – Arena type; must be one of [‘circular-autodetect’, ‘circular-manual’, ‘polygon-manual’].
arena_reference (list) – List of coordinates defining the reference arena annotated by the user.
segmentation_model (torch.nn.Module) – Model used for automatic arena detection.
num_sample_frames (int) – Number of frames to randomly sample from video that are then averaged for arena detection.
debug (bool) – If True, save a video frame with the arena detected.

Returns:

1D-array containing information about the arena. If the arena is circular, returns a 3-element-array) -> center, radius, and angle. If arena is polygonal, returns a list with x-y position of each of the n the vertices of the polygon. h (int): Height of the video in pixels. w (int): Width of the video in pixels.

Return type:

arena (np.ndarray)

deepof.arena_utils.save_arena_image(numpy_im, roi, image_export_path, name, arena_reference=None, color=None): Saves one video frame of the arena with annotations (detected arena or chosen rois)

deepof.arena_utils.display_message(message: List[str])

Opens a window that displays a message for the user

Parameters:: message – List of strings containing the message

deepof.arena_utils.get_random_frame(video_path: str)

deepof.arena_utils.extract_polygonal_arena_coordinates(video_path: str, arena_type: str, video_index: int, videos: list, list_of_rois: list = 0, roi_dicts: dict = {}, arena_dict: dict = {}, key_current: str | None = None, get_arena: bool = True, arena_dims: float = 1.0, norm_dist: float | None = None, image_export_path: str | None = None, edit_tag: str = '', test: bool = False)

Read a random frame from the selected video, and opens an interactive GUI to let the user delineate the arena manually.

Parameters:

video_path (str) – Path to the video file.
arena_type (str) – Type of arena to be used. Must be one of the following: “circular-manual”, “polygonal-manual”.
video_index (int) – Index of the current video in the list of videos.
videos (list) – List of videos to be processed.
list_of_rois (int) – list of roi numbers to draw,
get_arena (bool) – retrieve arena or skip step (default is True)
arena_dims (float) – Distance as taken from video in pixels
norm_dist (float) – Same distance as arena_dims for normalization in mm
arena_params (np.ndarray) – nx2 array containing the x-y coordinates of all n corners of the polygonal arena.
edit_tag (str) – optional affix for arena or roi name to prevent overwriting of existing images.
test (bool) – Runs project in test mode and bypasses manual inputs, defaults to false

Returns:

nx2 array containing the x-y coordinates of all n corners of the polygonal arena. int: Height of the video. int: Width of the video.

Return type:

arena_corners (np.ndarray)

deepof.arena_utils.fit_ellipse_to_polygon(polygon: list, return_ellipse=True)

Fit an ellipse to the provided polygon.

Parameters:: polygon (list) – List of (x,y) coordinates of the corners of the polygon.

If return_ellipse:

Returns:: center_coordinates (tuple): (x,y) coordinates of the center of the ellipse. axes_length (tuple): (a,b) semi-major and semi-minor axes of the ellipse. ellipse_angle (float): Angle of the ellipse.

else:

Returns:: vertices-coordinates (np.array): array of (x,y) points on the ellipse edge

deepof.arena_utils.get_first_length(arena_corners, w_ratio, h_ratio): gets the length of the first edge in arena_corners

deepof.arena_utils.arena_parameter_extraction(frame: ndarray, arena_type: str) → array

Return x,y position of the center, the lengths of the major and minor axes, and the angle of the recognised arena.

Parameters:

frame (np.ndarray) – numpy.ndarray representing an individual frame of a video
arena_type (str) – Type of arena to be used. Must be either “circular” or “polygonal”.

Returns:

center_coordinates (tuple): (x,y) coordinates of the center of the ellipse.: axes_length (tuple): (a,b) semi-major and semi-minor axes of the ellipse. ellipse_angle (float): Angle of the ellipse.
ELIF arena_type==”polygonal”: np.ndarray: (x,y) coordinates of all points of the polygon

Return type:

IF arena_type==”circular”

deepof.arena_utils.create_inner_polygon(outer_vertices, target_area_ratio=0.7, tolerance=0.01, max_iterations=100, return_inner=True)

Creates an inner polygon that covers approximately target_area_ratio Percent of the outer polygon’s area. Returns either the inner polygon or the difference between outer and inner polygon.

Parameters:

outer_vertices (numpy.ndarray) – Nx2 array of vertices defining the outer polygon
target_area_ratio (float) – Target ratio of inner to outer polygon area (default: 0.7)
tolerance (float) – Acceptable tolerance for area ratio (default: 0.01)
max_iterations (int) – Maximum number of iterations for binary search (default: 100)
return_inner (bool) – If True, returns inner polygon; if False, returns outer ring (default: True)

Returns:

Mx2 array of vertices defining either the inner polygon or outer ring

Return type:

vertices (numpy.ndarray)

deepof.arena_utils.extract_corners_from_arena(arena_params: list | tuple | ndarray, num_points: int = 100)

Extracts polygon corner coordinates from given arena parameters.

In case of polygonal arenas: Input is returned directly In case of circular arenas: Input is converted into a polygon with num_points.

Parameters:

params (Union[Tuple, np.ndarray]) –
- For a circular arena: A tuple containing ((center_x, center_y), (radius_x, radius_y), angle_degrees).
- For a polygonal arena: A NumPy array of shape (N, 2) with vertex coordinates.
num_points (int) – Number of vertices for the ellipse. Defaults to 100.

Returns:

A NumPy array of shape (M, 2) representing the polygon vertices.

Return type:

polygon (np.ndarray)

Raises:

TypeError – If the input params is not a recognized type or format.

deepof.arena_utils.confirm_action(message: str, window_name: str = 'Confirm')

Displays a confirmation dialog using OpenCV with multi-line support.

Parameters:

message – The message to display (use ‘n’ for line breaks).
window_name – Name of the OpenCV window.

Returns:

True if ‘y’ pressed, False if ‘n’ pressed.

Return type:

bool

class deepof.arena_utils.DropdownConfig(margin_right: int = 10, margin_top: int = 10, width: int = 60, height: int = 25, option_height: int = 25, font_scale: float = 0.5, font_thickness: int = 1, border_color: Tuple[int, int, int] = (100, 100, 100), fill_color: Tuple[int, int, int] = (200, 200, 200), text_color: Tuple[int, int, int] = (0, 0, 0), main_box_color: Tuple[int, int, int] = (220, 220, 220))

Bases: object

margin_right: int = 10

margin_top: int = 10

width: int = 60

height: int = 25

option_height: int = 25

font_scale: float = 0.5

font_thickness: int = 1

border_color: Tuple[int, int, int] = (100, 100, 100)

fill_color: Tuple[int, int, int] = (200, 200, 200)

text_color: Tuple[int, int, int] = (0, 0, 0)

main_box_color: Tuple[int, int, int] = (220, 220, 220)

__init__(margin_right: int = 10, margin_top: int = 10, width: int = 60, height: int = 25, option_height: int = 25, font_scale: float = 0.5, font_thickness: int = 1, border_color: Tuple[int, int, int] = (100, 100, 100), fill_color: Tuple[int, int, int] = (200, 200, 200), text_color: Tuple[int, int, int] = (0, 0, 0), main_box_color: Tuple[int, int, int] = (220, 220, 220)) → None

class deepof.arena_utils.DropdownUI(window_name: str, options: List[str], window_width: int, hidden: bool = False, config: DropdownConfig | None = None)

Bases: object

__init__(window_name: str, options: List[str], window_width: int, hidden: bool = False, config: DropdownConfig | None = None)

draw(img: ndarray) → None

handle_mouse(event: int, x: int, y: int): Returns the newly selected option if changed, None otherwise

deepof.arena_utils.retrieve_corners_from_image(frame: ndarray, arena_type: str, cur_vid: int, videos: list, current_roi: int = 0, arena_dims: float = 1.0, norm_dist: float | None = None, arena_corners: ndarray | None = None, corners: list = [], test: bool = False)

Open a window and waits for the user to click on all corners of the polygonal arena.

The user should click on the corners in sequential order.

Parameters:

frame (np.ndarray) – Frame to display.
arena_type (str) – Type of arena to be used. Must be one of the following: “circular-manual”, “polygon-manual”.
cur_vid (int) – Index of the current video in the list of videos.
videos (list) – List of videos to be processed.
current_roi (int) – Current ROI to be extracted. 0 is the global arena ROI
arena_dims (float) – Distance as taken from video in pixels
norm_dist (float) – Same distance as arena_dims for normalization in mm
arena_corners (np.ndarray) – Corners of arena, relevant for automatic ROIs
test (bool) – Runs project in test mode and bypasses manual inputs, defaults to false

Returns:

nx2 array containing the x-y coordinates of all n corners.

Return type:

corners (np.ndarray)

deepof.config module

class deepof.config.DistanceUnit(value)

Bases: Enum

An enumeration.

pixel = 0.0

px = 0.0

mm = 1.0

millimeter = 1.0

cm = 10

centimeter = 10

m = 1000

meter = 1000

km = 1000000

kilometer = 1000000

inch = 25.4

foot = 304.8

yard = 914.4

mile = 1609000

factor(mm_to_pix=None): Multiplier to convert mm -> this unit. mm_to_pix can be scalar or array-like.

classmethod parse(unit: str) → DistanceUnit

class deepof.config.TimeUnit(value)

Bases: Enum

An enumeration.

fr = 0.0

frames = 0.0

s = 1.0

seconds = 1.0

min = 60.0

minutes = 60.0

h = 3600.0

hours = 3600.0

factor(fps: float) → float: Multiplier to convert frames -> this unit.

classmethod parse(unit: str) → TimeUnit

class deepof.config.SpeedUnit(value)

Bases: Enum

An enumeration.

mm_s = 1

m_s = 0.001

m_h = 3.6

class deepof.config.BitPrecision(value)

Bases: Enum

An enumeration.

f16 = 16

f32 = 32

f64 = 64

fauto = 0

property dtype

classmethod parse(unit: int) → BitPrecision

deepof.data_loading module

Data loading functionality for the deepof package.

deepof.data_loading.get_dt(tab_dict: dict, key: str, return_path: bool = False, only_metainfo: bool = False, load_index: bool = False, load_range: ndarray | None = None)

deepof.data_loading.save_dt(dt: DataFrame | ndarray | Tuple[ndarray, ndarray], folder_path: str, return_path: bool = False)

deepof.data_manager module

deepof.data_manager.sanitize_table_name(table_name: str) → str

deepof.data_manager.suppress_warnings(warn_messages)

class deepof.data_manager.DataManager(db_path: str)

Bases: object

__init__(db_path: str)

close()

save(key: str, data: DataFrame | ndarray | Tuple[ndarray, ndarray])

load(key: str, return_path: bool = False, only_metainfo: bool = False, load_index: bool = False, load_range: ndarray | None = None)

deepof.export_video module

Plotting utility functions for the deepof package.

class deepof.export_video.VideoExportConfig(display_behavior_names: bool = True, display_video_name: bool = False, display_time: bool = False, display_counter: bool = False, display_arena: bool = False, display_markers: bool = False, display_mouse_labels: bool = False, display_loading_bar: bool = True, display_roi: int | None = None, supervised_export: bool = True)

Bases: object

Configuration for video annotations.

display_behavior_names: bool = True

display_video_name: bool = False

display_time: bool = False

display_counter: bool = False

display_arena: bool = False

display_markers: bool = False

display_mouse_labels: bool = False

display_loading_bar: bool = True

display_roi: int = None

supervised_export: bool = True

__init__(display_behavior_names: bool = True, display_video_name: bool = False, display_time: bool = False, display_counter: bool = False, display_arena: bool = False, display_markers: bool = False, display_mouse_labels: bool = False, display_loading_bar: bool = True, display_roi: int | None = None, supervised_export: bool = True) → None

class deepof.export_video.VideoExportProps(font: int = 2, font_scale: float = 0.5, thickness: int = 1, padding: int = 5, text_color: Tuple[int, int, int] = (255, 255, 255), outline_color: Tuple[int, int, int] = (0, 0, 0), arena_color: Tuple[int, int, int] = (40, 86, 236), arena_thickness: int = 3, marker_radius: int = 3)

Bases: object

Parameters for drawing text and shapes on the video frame.

font: int = 2

font_scale: float = 0.5

thickness: int = 1

padding: int = 5

text_color: Tuple[int, int, int] = (255, 255, 255)

outline_color: Tuple[int, int, int] = (0, 0, 0)

arena_color: Tuple[int, int, int] = (40, 86, 236)

arena_thickness: int = 3

marker_radius: int = 3

__init__(font: int = 2, font_scale: float = 0.5, thickness: int = 1, padding: int = 5, text_color: Tuple[int, int, int] = (255, 255, 255), outline_color: Tuple[int, int, int] = (0, 0, 0), arena_color: Tuple[int, int, int] = (40, 86, 236), arena_thickness: int = 3, marker_radius: int = 3) → None

deepof.export_video.output_videos_per_cluster(coordinates: deepof_coordinates, exp_conditions: dict, behavior_dict: dict, behaviors: str | list, behaviors_renamed: list, frame_limit_per_video: int = inf, bin_info: dict | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, single_output_resolution: tuple | None = None, min_confidence: float = 0.0, min_bout_duration: int | None = None, video_export_config: VideoExportConfig = VideoExportConfig(display_behavior_names=True, display_video_name=False, display_time=False, display_counter=False, display_arena=False, display_markers=False, display_mouse_labels=False, display_loading_bar=True, display_roi=None, supervised_export=True), out_path: str = '.', roi_mode: str = 'mousewise'): Generates one consolidated video per behavior, compiled from multiple experiments.

deepof.export_video.output_annotated_video(coordinates: deepof_coordinates, experiment_id: str, tab: DataFrame, behaviors: List[str], video_export_config: VideoExportConfig = VideoExportConfig(display_behavior_names=True, display_video_name=False, display_time=False, display_counter=False, display_arena=False, display_markers=False, display_mouse_labels=False, display_loading_bar=True, display_roi=None, supervised_export=True), frames: array | None = None, cap: Any | None = None, out: Any | None = None, v_width: int | None = None, v_height: int | None = None, frame_limit: int = inf, out_path: Path = PosixPath('.'), behaviors_renamed: List | None = None)

Generates a video with frames annotated with specified behaviors and other metadata.

Parameters:

coordinates – Coordinates object for the project, used to access video paths and metadata.
experiment_id – ID of the experiment to export.
tab – DataFrame with behavior probabilities/scores per frame.
behaviors – A list of behavior names (columns in tab) to annotate.
video_export_config – A dataclass object specifying video export information (what to display, export mode).
frames – An array of specific frame indices to include in the output video.
cap – An existing cv2.VideoCapture object. If None, one will be created.
out – An existing cv2.VideoWriter object. If None, one will be created.
v_width – Desired output video width. Defaults to source video width.
v_height – Desired output video height. Defaults to source video height.
frame_limit – Maximum number of frames to process.
out_path – The directory where the output video will be saved.
behaviors_renamed – List of updated behavior names for display

deepof.data module

Data structures for preprocessing and wrangling of motion tracking output data. This is the main module handled by the user.

There are three main data structures to pay attention to: - Project, which serves as a configuration hub for the whole pipeline - Coordinates, which acts as an intermediary between project configuration and data, and contains a plethora of processing methods to apply, and - TableDict, which is the main data structure to store the data, having experiment IDs as keys and processed time-series as values in a dictionary-like object.

For a detailed tutorial on how to use this module, see the advanced tutorials in the main section.

deepof.data.is_display_available()

deepof.data.load_project(project_path: str, animal_ids: List | None = None, arena: str = 'polygonal-autodetect', bodypart_graph: str | dict = 'deepof_14', iterative_imputation: str = 'partial', exclude_bodyparts: List = ('',), exp_conditions: dict | None = None, start_markers: dict | None = None, remove_outliers: bool = True, interpolation_limit: int = 5, interpolation_std: int = 3, likelihood_tol: float = 0.75, model: str = 'mouse_topview', project_name: str = 'deepof_project', video_path: str | None = None, table_path: str | None = None, rename_bodyparts: list | None = None, sam_checkpoint_path: str | None = None, smooth_alpha: float = 1, table_format: str = 'autodetect', video_format: str = '.mp4', video_scale: int = 1, number_of_rois=0, fast_implementations_threshold: int = 50000, bit_precision: int | None = None) → deepof_coordinates

Load a pre-saved pickled Coordinates object. Will update Coordinate objects from older versions of deepof (down to 0.7) to work with this version. Very old projects will be recreated during loading with the current version of Deepof. For this purpose input arguments can be set just as in a recular project definition.

Parameters:

animal_ids (list) – list of animal ids.
arena (str) – arena type. Can be one of “circular-autodetect”, “circular-manual”, “polygonal-autodetect”, or “polygonal-manual”.
bodypart_graph (str) – body part scheme to use for the analysis. Defaults to None, in which case the program will attempt to select it automatically based on the available body parts.
iterative_imputation (str) – whether to use iterative imputation for occluded body parts, options are “full” and “partial”. if set to None, no imputation takes place.
exclude_bodyparts (list) – list of bodyparts to exclude from analysis.
exp_conditions (dict) – dictionary with experiment IDs as keys and experimental conditions as values.
start_markers (dict) – dictionary with experiment IDs as keys and start markers as values.
remove_outliers (bool) – whether outliers should be removed during project creation.
interpolation_limit (int) – maximum number of missing frames to interpolate.
interpolation_std (int) – maximum number of standard deviations to interpolate.
likelihood_tol (float) – likelihood threshold for outlier detection.
model (str) – model to use for pose estimation. Defaults to ‘mouse_topview’ (as described in the documentation).
project_name (str) – name of the current project.
project_path (str) – path to the folder containing the motion tracking output data.
video_path (str) – path where to find the videos to use. If not specified, deepof, assumes they are in your project path.
table_path (str) – path where to find the tracks to use. If not specified, deepof, assumes they are in your project path.
rename_bodyparts (list) – list of names to use for the body parts in the provided tracking files. The order should match that of the columns in your DLC tables or the node dimensions on your (S)LEAP .npy files.
sam_checkpoint_path (str) – path to the checkpoint file for the SAM model. If not specified, the model will be saved in the installation folder.
smooth_alpha (float) – smoothing intensity. The higher the value, the more smoothing.
table_format (str) – format of the table. Defaults to ‘autodetect’, but can be set to “csv” or “h5” for DLC output, and “npy”, “slp” or “analysis.h5” for (S)LEAP.
video_format (str) – video format. Defaults to ‘.mp4’.
video_scale (int) – diameter of the arena in mm (if the arena is round) or length of the first specified arena side (if the arena is polygonal).
number_of_rois (int) – number of behavior rois to be drawn during project creation, default = 0,
fast_implementations_threshold (int) – If the total number of frames in the project is larger than this, numba implementations of all functions with a numba option will be used.
bit_precision (int) – Minimum bit precision used for the biggest tables. Default is None for auto precision (64 and gets decreased for large projects). Can be manually set to 32 or 64

Returns:

Pre-run coordinates object.

Return type:

coordinates (deepof_coordinates)

class deepof.data.Project(animal_ids: List | None = None, arena: str = 'polygonal-autodetect', bodypart_graph: str | dict = 'deepof_14', iterative_imputation: str = 'partial', exclude_bodyparts: List = ('',), exp_conditions: str | dict | None = None, start_markers: str | dict | None = None, remove_outliers: bool = True, interpolation_limit: int = 5, interpolation_std: int = 3, likelihood_tol: float = 0.75, model: str = 'mouse_topview', project_name: str = 'deepof_project', project_path: str = '.', video_path: str | None = None, table_path: str | None = None, rename_bodyparts: list | None = None, sam_checkpoint_path: str | None = None, smooth_alpha: float = 1, table_format: str = 'autodetect', video_format: str = '.mp4', video_scale: str | None = None, number_of_rois: int = 0, frame_rate: float | None = None, fast_implementations_threshold: int = 50000, bit_precision: int | None = None)

Bases: object

Class for loading and preprocessing motion tracking data of individual and multiple animals.

All main computations are handled from here.

__init__(animal_ids: List | None = None, arena: str = 'polygonal-autodetect', bodypart_graph: str | dict = 'deepof_14', iterative_imputation: str = 'partial', exclude_bodyparts: List = ('',), exp_conditions: str | dict | None = None, start_markers: str | dict | None = None, remove_outliers: bool = True, interpolation_limit: int = 5, interpolation_std: int = 3, likelihood_tol: float = 0.75, model: str = 'mouse_topview', project_name: str = 'deepof_project', project_path: str = '.', video_path: str | None = None, table_path: str | None = None, rename_bodyparts: list | None = None, sam_checkpoint_path: str | None = None, smooth_alpha: float = 1, table_format: str = 'autodetect', video_format: str = '.mp4', video_scale: str | None = None, number_of_rois: int = 0, frame_rate: float | None = None, fast_implementations_threshold: int = 50000, bit_precision: int | None = None)

Initialize a Project object.

Parameters:

animal_ids (list) – list of animal ids.
arena (str) – arena type. Can be one of “circular-autodetect”, “circular-manual”, “polygonal-autodetect”, or “polygonal-manual”.
bodypart_graph (str) – body part scheme to use for the analysis. Defaults to None, in which case the program will attempt to select it automatically based on the available body parts.
iterative_imputation (str) – whether to use iterative imputation for occluded body parts, options are “full” and “partial”. if set to None, no imputation takes place.
exclude_bodyparts (list) – list of bodyparts to exclude from analysis.
Union[str (start_markers) – path to .csv-file or dictionary with experiment IDs as keys and experimental conditions as values.
dict] – path to .csv-file or dictionary with experiment IDs as keys and experimental conditions as values.
Union[str – path to .csv-file or dictionary with experiment IDs as keys and start markers as values
dict] – path to .csv-file or dictionary with experiment IDs as keys and start markers as values
remove_outliers (bool) – whether outliers should be removed during project creation.
interpolation_limit (int) – maximum number of missing frames to interpolate.
interpolation_std (int) – maximum number of standard deviations to interpolate.
likelihood_tol (float) – likelihood threshold for outlier detection.
model (str) – model to use for pose estimation. Defaults to ‘mouse_topview’ (as described in the documentation).
project_name (str) – name of the current project.
project_path (str) – path to the folder containing the motion tracking output data.
video_path (str) – path where to find the videos to use. If not specified, deepof, assumes they are in your project path.
table_path (str) – path where to find the tracks to use. If not specified, deepof, assumes they are in your project path.
rename_bodyparts (list) – list of names to use for the body parts in the provided tracking files. The order should match that of the columns in your DLC tables or the node dimensions on your (S)LEAP .npy files.
sam_checkpoint_path (str) – path to the checkpoint file for the SAM model. If not specified, the model will be saved in the installation folder.
smooth_alpha (float) – smoothing intensity. The higher the value, the more smoothing.
table_format (str) – format of the table. Defaults to ‘autodetect’, but can be set to “csv” or “h5” for DLC output, and “npy”, “slp” or “analysis.h5” for (S)LEAP.
video_format (str) – video format. Defaults to ‘.mp4’.
video_scale (int) – diameter of the arena in mm (if the arena is round) or length of the first specified arena side (if the arena is polygonal).
number_of_rois (int) – number of behavior rois to be drawn during project creation, default = 0,
fast_implementations_threshold (int) – If the total number of frames in the project is larger than this, numba implementations of all functions with a numba option will be used.
bit_precision (int) – Minimum bit precision used for the biggest tables. Default is None for auto precision (64 and gets decreased for large projects). Can be manually set to 32 or 64

set_up_project_directory(debug=False): Create a project directory where to save all produced results.

rename_bodyparts(rename_bodyparts, table_format)

load_start_markers(filepath): Load start markers analogous to experimental conditions and do some checks

load_exp_conditions(filepath)

Load experimental conditions from a wide-format csv table.

Parameters:: filepath (str) – Path to the file containing the experimental conditions.

save_arena_data(arena_path: str, arena_params: dict | None = None, roi_dicts: dict | None = None, scales: dict | None = None, video_resolution: dict | None = None) → None

Save ROI dictionaries, arena parameters, and scales as a single pickle file.

Parameters:

arena_path (str) – Output path to the .pkl file.
arena_params (dict) – arena info.
roi_dicts (dict)
scales (dict)

load_arena_data(arena_path: str, load_also_rois: bool = False) → Tuple[Dict, Dict, Dict, Dict]

Load ROI dictionaries, arena parameters, and scales from a pickle file, with checks.

Parameters:

arena_path (str) – Path to the .pkl file to load.
load_also_rois (bool) – If False, skip ROI loading/validation and return roi_dicts as None.

Returns:

roi_dicts is None if load_arena_only is True, otherwise the (possibly truncated) ROI dict. arena_params and scales are always returned.

Return type:

(roi_dicts, arena_params, scales)

get_arena(tables: dict, arena_path: str | None = None, debug: str = False, test: bool = False, load_also_rois: bool = False) → array

Return the arena as recognised from the videos.

Parameters:

tables (dict) – dictionary containing coordinate tables
arena_path – str, path to saved arena data, will try to load arena and ROI data if not None.
debug (str) – if True, saves intermediate results to disk
test (bool) – if True, runs the function in test mode

Returns:

arena parameters, as recognised from the videos. The shape depends on the arena type

Return type:

arena (np.ndarray)

preprocess_tables() → Tuple[deepof_table_dict, deepof_table_dict]: Loads and preprocesses tracking data through a series of modular steps, then saves the results and returns table dictionaries.

scale_tables(tab_dict: deepof_table_dict) → deepof_table_dict

Scales all tables to mm using scaling information from arena detection.

Parameters:: tab_dict (table_dict) – Table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
Returns:: Scaled table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
Return type:: tab_dict (table_dict)

get_distances(tab_dict: deepof_table_dict) → dict

Compute the distances between all selected body parts over time for a table dictionary.

Parameters:: tab_dict (table_dict) – Table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
Returns:: Table dictionary of pandas DataFrames containing the distances between all bodyparts.
Return type:: distance_dict

get_distances_tab(tab: DataFrame) → dict

Compute the distances between all selected body parts over time for a single table.

Parameters:: tab (pd.DataFrame) – Pandas DataFrame containing the trajectories of all bodyparts.
Returns:: Pandas DataFrame containing the distances between all bodyparts.
Return type:: distance_tab

get_angles(tab_dict: deepof_table_dict) → dict

Compute all the angles between adjacent bodypart trios per video and per frame in all datasets in the given table dictionary.

Parameters:: tab_dict (table_dict) – Table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
Returns:: Table dictionary of pandas DataFrames containing the angles between all bodyparts.
Return type:: angle_dict

get_areas(tab_dict: deepof_table_dict) → dict

Compute all relevant areas (head, torso, back) per video and per frame in the data.

Parameters:: tab_dict (table_dict) – Table dictionary of pandas DataFrames containing the trajectories of all bodyparts.
Returns:: Table dictionary of pandas DataFrames containing the areas (head, torso, back) between sets of bodyparts.
Return type:: all_areas_dict

create(verbose: bool = True, force: bool = False, arena_path: str | None = None, debug: bool = True, test: bool = False, _to_extend: deepof_coordinates | None = None) → deepof_coordinates

Generate a deepof.Coordinates dataset using all the options specified during initialization.

Parameters:

verbose (bool) – If True, prints progress. Defaults to True.
force (bool) – If True, overwrites existing project. Defaults to False.
arena_path – str, path to saved arena data, will try to load arena and ROI data if not None.
debug (bool) – If True, saves arena detection images to disk. Defaults to False.
test (bool) – If True, creates the project in test mode (which, for example, bypasses any manual input). Defaults to False.
_to_extend (coordinates) – Coordinates object to extend with the current dataset. For internal usage only.

Returns:

Deepof.Coordinates object containing the trajectories of all bodyparts.

Return type:

coordinates (coordinates)

property distances: Returns distances table_dict

property ego: String, name of a body part. If True, computes only the distances between the specified body part and the rest.

property angles: Returns angles table_dict

extend(project_to_extend: deepof_coordinates, video_path: str | None = None, table_path: str | None = None, verbose: bool = True, debug: bool = True, test: bool = False) → deepof_coordinates

Generate a deepof.Coordinates dataset using all the options specified during initialization.

Parameters:

project_to_extend (coordinates) – Coordinates object to extend with the current dataset.
video_path (str) – Path to the videos. If not specified, defaults to the project path.
table_path (str) – Path to the tracks. If not specified, defaults to the project path.
verbose (bool) – Prints progress if True. Defaults to True.
debug (bool) – Saves arena detection images to disk if True. Defaults to False.
test (bool) – Runs the project in test mode if True. Defaults to False.

Returns:

Deepof.Coordinates object containing the trajectories of all body parts.

Return type:

coordinates (coordinates)

class deepof.data.Coordinates(project_path: str, project_name: str, arena: str, arena_dims: array, bodypart_graph: str, path: str, quality: dict, scales: dict, frame_rate: float, arena_params: dict, roi_dicts: dict, tables: dict, source_table_path: str, table_paths: List, trained_model_path: str, videos: List, video_path: str, video_resolution: dict, angles: dict | None = None, animal_ids: List = ('',), areas: dict | None = None, distances: dict | None = None, connectivity: Graph | None = None, excluded_bodyparts: list | None = None, exp_conditions: dict | None = None, start_markers: dict | None = None, number_of_rois: int = 0, run_numba: bool = False, very_large_project: bool = False, version: str | None = None, bit_precision: int = 64)

Bases: object

Class for storing the results of a ran project. Methods are mostly setters and getters in charge of tidying up the generated tables.

__init__(project_path: str, project_name: str, arena: str, arena_dims: array, bodypart_graph: str, path: str, quality: dict, scales: dict, frame_rate: float, arena_params: dict, roi_dicts: dict, tables: dict, source_table_path: str, table_paths: List, trained_model_path: str, videos: List, video_path: str, video_resolution: dict, angles: dict | None = None, animal_ids: List = ('',), areas: dict | None = None, distances: dict | None = None, connectivity: Graph | None = None, excluded_bodyparts: list | None = None, exp_conditions: dict | None = None, start_markers: dict | None = None, number_of_rois: int = 0, run_numba: bool = False, very_large_project: bool = False, version: str | None = None, bit_precision: int = 64)

Class for storing the results of a ran project. Methods are mostly setters and getters in charge of tidying up the generated tables.

Parameters:

project_name (str) – name of the current project.
project_path (str) – path to the folder containing the motion tracking output data.
arena (str) – Type of arena used for the experiment. See deepof.data.Project for more information.
arena_dims (np.array) – Dimensions of the arena. See deepof.data.Project for more information.
bodypart_graph (nx.Graph) – Graph containing the body part connectivity. See deepof.data.Project for more information.
path (str) – Path to the folder containing the results of the experiment.
quality (dict) – Dictionary containing the quality of the experiment. See deepof.data.Project for more information.
scales (dict) – Scales used for the experiment. See deepof.data.Project for more information.
frame_rate (float) – frame rate of the processed videos.
arena_params (dict) – Dictionary containing the parameters of the arena. See deepof.data.Project for more information.
roi_dicts (dict) – Dictionary containing all rois for all videos as determined byt he user.
tables (dict) – Dictionary containing the tables of the experiment. See deepof.data.Project for more information.
table_paths (List) – List containing the paths to the tables of the experiment. See deepof.data.Project for more information.f
trained_model_path (str) – Path to the trained models used for the supervised pipeline. For internal use only.
videos (List) – List containing the videos used for the experiment. See deepof.data.Project for more information.
video_resolution (dict) – Dictionary containing the automatically detected resolution of the videos used for the experiment.
angles (dict) – Dictionary containing the angles of the experiment. See deepof.data.Project for more information.
animal_ids (List) – List containing the animal IDs of the experiment. See deepof.data.Project for more information.
areas (dict) – dictionary with areas to compute. By default, it includes head, torso, and back.
distances (dict) – Dictionary containing the distances of the experiment. See deepof.data.Project for more information.
excluded_bodyparts (list) – list of bodyparts to exclude from analysis.
exp_conditions (dict) – Dictionary containing the experimental conditions of the experiment. See deepof.data.Project for more information.
start_markers (dict) – Dictionary containing the start markers of the experiment. See deepof.data.Project for more information.
number_of_rois (int) – number of behavior rois t be drawn during project creation, default = 0,
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)
very_large_project (bool) – Decides if memory efficient data loading and saving should be used
version (str) – version of deepof this object was created with
bit_precision (int) – Minimum bit precision used for the biggest tables. Default is 0 for auto precision (64 and gets decreased for large projects). Can be manually set to 16, 32 or 64

get_table_keys(): get the keys to all experiments in this coordinates object

get_coords(center: str = False, polar: bool = False, speed: int = 0, align: str = False, align_group: bool = False, align_inplace: bool = True, to_video: bool = False, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, in_roi_criterion: str = 'Center', invert_roi: bool = False, file_name: str = 'coords', return_path: bool = False) → deepof_table_dict

Return a table_dict object with the coordinates of each animal as values.

Parameters:

center (str) – Name of the body part to which the positions will be centered. If false, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch
polar (bool)
speed (int) – States the derivative of the positions to report. Speed is returned if 1, acceleration if 2, jerk if 3, etc.
align (str) – Selects the body part to which later processes will align the frames with (see preprocess in table_dict documentation).
align_inplace (bool) – Only valid if align is set. Aligns the vector that goes from the origin to the selected body part with the y-axis, for all timepoints (default).
to_video (bool) – Undoes the scaling to mm back to the pixel scaling from the original video
selected_id (str) – Selects a single animal on multi animal settings. Defaults to None (all animals are processed).
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
in_roi_criterion (str) – Bodypart of a mouse that has to be in the ROI to count the mouse as “inside” the ROI.
file_name (str) – Name of the file for saving
return_path (bool) – if True, Return only the path to the saving location of the processed table, if false, return the full table.

Returns:

A table_dict object containing the coordinates of each animal as values.

Return type:

table_dict

get_coords_at_key(key: str, scale: array, quality: deepof_table_dict | None = None, center: str = False, polar: bool = False, speed: int = 0, align: str = False, align_group: bool = False, align_inplace: bool = True, to_video: bool = False, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, in_roi_criterion: str = 'Center', invert_roi: bool = False) → DataFrame

Return a pandas dataFrame with the coordinates for the selected key as values.

Parameters:

key (str) – key for requested distance
scale (np.array) – scale of the current arena.
quality – (table_dict): Quality information for current data Frame
center (str) – Name of the body part to which the positions will be centered. If false, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch
polar (bool)
speed (int) – States the derivative of the positions to report. Speed is returned if 1, acceleration if 2, jerk if 3, etc.
align (str) – Selects the body part to which later processes will align the frames with (see preprocess in table_dict documentation).
align_inplace (bool) – Only valid if align is set. Aligns the vector that goes from the origin to the selected body part with the y-axis, for all timepoints (default).
to_video (bool) – Undoes the scaling to mm back to the pixel scaling from the original video
selected_id (str) – Selects a single animal on multi animal settings. Defaults to None (all animals are processed).
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
in_roi_criterion (str) – Bodypart of a mouse that has to be in the ROI to count the mouse as “inside” the ROI.

Returns:

A data frame containing the coordinates for the selected key as values.

Return type:

tab (pd.DataFrame)

get_distances(speed: int = 0, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, invert_roi: bool = False, filter_on_graph: bool = True, file_name: str = 'got_distances', return_path: bool = False) → deepof_table_dict

Return a table_dict object with the distances between body parts animal as values.

Parameters:

speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
filter_on_graph (bool) – If True, only distances between connected nodes in the DeepOF graph representations are kept. Otherwise, all distances between bodyparts are returned.
file_name (str) – Name of the file for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.

Returns:

A table_dict object with the distances between body parts animal as values.

Return type:

table_dict

get_distances_at_key(key: str, quality: deepof_table_dict | None = None, speed: int = 0, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, invert_roi: bool = False, filter_on_graph: bool = True) → DataFrame

Return a pd.DataFrame with the distances between body parts of one animal as values.

Parameters:

key (str) – key for requested distance
quality – (table_dict): Quality information for current data Frame
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
filter_on_graph (bool) – If True, only distances between connected nodes in the DeepOF graph representations are kept. Otherwise, all distances between bodyparts are returned.

Returns:

A pd.DataFrame with the distances between body parts of one animal as values.

Return type:

tab (pd.DataFrame)

get_angles(degrees: bool = False, speed: int = 0, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, invert_roi: bool = False, file_name: str = 'got_angles', return_path: bool = False) → deepof_table_dict

Return a table_dict object with the angles between body parts animal as values.

Parameters:

degrees (bool) – If True, angles are converted to degrees; otherwise they remain in radians (default).
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
file_name (str) – Name of the file for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.

Returns:

A table_dict object with the angles between body parts animal as values.

Return type:

table_dict

get_angles_at_key(key: str, quality: deepof_table_dict | None = None, degrees: bool = False, speed: int = 0, selected_id: str | None = None, roi_number: int | None = None, animals_in_roi: str | None = None, invert_roi: bool = False) → DataFrame

Return a Dataframe with the angles between body parts for one animal as values.

Parameters:

key (str) – key for requested distance
quality – (table_dict): Quality information for current data Frame
degrees (bool) – If True, angles are converted to degrees; otherwise they remain in radians (default).
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded

Returns:

A pd.DataFrame with the angles between body parts of one animal as values.

Return type:

tab (pd.DataFrame)

get_areas(speed: int = 0, selected_id: str = 'all', roi_number: int | None = None, animals_in_roi: str | None = None, invert_roi: bool = False, file_name: str = 'got_areas', return_path: bool = False) → deepof_table_dict

Return a table_dict object with all relevant areas (head, torso, back, full). Unless specified otherwise, the areas are computed for all animals.

Parameters:

speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select. “all” (default) computes the areas for all animals. Declared in self._animal_ids.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
file_name (str) – Name of the file for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.

Returns:

A table_dict object with the areas of the body parts animal as values.

Return type:

table_dict

get_areas_at_key(key: str, quality: deepof_table_dict | None = None, speed: int = 0, selected_id: str = 'all', roi_number: int | None = None, animals_in_roi: str | None = None, invert_roi: bool = False) → deepof_table_dict

Return a pd.DataFrame with all relevant areas (head, torso, back, full). Unless specified otherwise, the areas are computed for all animals.

Parameters:

key (str) – key for requested distance
quality – (table_dict): Quality information for current data Frame
speed (int) – The derivative to use for speed.
selected_id (str) – The id of the animal to select. “all” (default) computes the areas for all animals. Declared in self._animal_ids.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded

Returns:

A pd.DataFrame object with the areas of the body parts animal as values.

Return type:

tab (pd.DataFrame)

get_videos(full_paths: bool = False, play: bool = False): Returns the videos associated with the dataset as a dictionary.

get_start_times(start_marker=None): Returns the start time for each table in a dictionary

get_end_times(): Returns the end time for each table in a dictionary

get_table_lengths(tab_dict_for_binning=None, start_marker=None): Returns the length for each table in a dictionary

property get_exp_conditions: Return the stored dictionary with experimental conditions per subject.

property get_start_markers: Return the stored dictionary with start markers per subject.

get_condition_values(exp_cond)

get_start_marker_values(start_marker, return_frames=True)

load_start_markers(filepath): Load start markers analogous to experimental conditions and do some checks

load_exp_conditions(filepath)

Load experimental conditions from a wide-format csv table.

Parameters:: filepath (str) – Path to the file containing the experimental conditions.

get_quality(): Retrieve a dictionary with the tagging quality per video, as reported by DLC or SLEAP.

property get_arenas: Retrieve all available information associated with the arena.

edit_arenas(video_keys: list | None = None, arena_type: str | None = None, verbose: bool = True)

Tag the arena in the videos.

Parameters:

video_keys (list) – A list of keys for videos to reannotate. If None, all videos are loaded.
arena_type (str) – The type of arena to use. Must be one of “polygonal-manual”, “circular-manual”, or “circular-autodetect”. If None (default), the arena type specified when creating the project is used.
verbose (bool) – Whether to print the progress of the annotation.

save(file=None, filename: str | None = None, timestamp: bool = True)

Save the current state of the Coordinates object to a pickled file.

Parameters:

file (obj) – optional Objet to save, if None, project gets saved
filename (str) – Name of the pickled file to store. If no name is provided, a default is used.
timestamp (bool) – Whether to append a time stamp at the end of the output file name.

get_graph_dataset(**kwargs)

get_supervised_parameters() → dict

Return the most frequent behaviour in a window of window_size frames.

Parameters:: hparams (dict) – dictionary containing hyperparameters to overwrite
Returns:: dictionary with overwritten parameters. Those not specified in the input retain their default values
Return type:: defaults (dict)

reset_supervised_parameters() → dict

Return the most frequent behaviour in a window of window_size frames.

Parameters:: hparams (dict) – dictionary containing hyperparameters to overwrite
Returns:: dictionary with overwritten parameters. Those not specified in the input retain their default values
Return type:: defaults (dict)

set_supervised_parameters(hparams: dict = {})

Return the most frequent behaviour in a window of window_size frames.

Parameters:: hparams (dict) – dictionary containing hyperparameters to overwrite
Returns:: dictionary with overwritten parameters. Those not specified in the input retain their default values
Return type:: defaults (dict)

supervised_annotation(**kwargs)

deep_unsupervised_embedding(preprocessed_object: Tuple[ndarray, ndarray, ndarray, ndarray], adjacency_matrix: ndarray | None = None, bin_size=None, bin_index=None, precomputed_bins=None, samples_max=None, embedding_model: str = 'VaDE', encoder_type: str = 'recurrent', batch_size: int = 64, latent_dim: int = 4, epochs: int = 150, log_history: bool = True, log_hparams: bool = False, n_clusters: int = 10, kmeans_loss: float = 0.0, temperature: float = 0.1, contrastive_similarity_function: str = 'cosine', contrastive_loss_function: str = 'nce', beta: float = 0.1, tau: float = 0.1, output_path: str = '', pretrained: str = False, save_checkpoints: bool = False, save_weights: bool = True, input_type: str = False, run: int = 0, kl_annealing_mode: str = 'linear', kl_warmup: int = 15, reg_cat_clusters: float = 0.0, recluster: bool = False, interaction_regularization: float = 0.0, bootstrap_training: bool = False, bootstrap_block_len: int = 250, random_seed: int = 0, **kwargs) → Tuple

Annotates coordinates using a deep unsupervised autoencoder.

Parameters:

preprocessed_object (tuple) – Tuple containing a preprocessed object (X_train, y_train, X_test, y_test).
adjacency_matrix (np.ndarray) – adjacency matrix of the connectivity graph to use.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
embedding_model (str) – Name of the embedding model to use. Must be one of VQVAE (default), VaDE, or contrastive.
encoder_type (str) – Encoder architecture to use. Must be one of “recurrent”, “TCN”, and “transformer”.
batch_size (int) – Batch size for training.
latent_dim (int) – Dimention size of the latent space.
epochs (int) – Maximum number of epochs to train the model. Actual training might be shorter, as the model will stop training when validation loss stops decreasing.
log_history (bool) – Whether to log the history of the model to TensorBoard.
log_hparams (bool) – Whether to log the hyperparameters of the model to TensorBoard.
n_components (int) – Number of latent clusters for the embedding model to use.
kmeans_loss (float) – Weight of the gram loss, which adds a regularization term to VaDE and VQVAE models which penalizes the correlation between the dimensions in the latent space.
temperature (float) – temperature parameter for the contrastive loss functions. Higher values put harsher penalties on negative pair similarity.
contrastive_similarity_function (str) – similarity function between positive and negative pairs. Must be one of ‘cosine’ (default), ‘euclidean’, ‘dot’, and ‘edit’.
contrastive_loss_function (str) – contrastive loss function. Must be one of ‘nce’ (default), ‘dcl’, ‘fc’, and ‘hard_dcl’. See specific documentation for details.
beta (float) – Beta (concentration) parameter for the hard_dcl contrastive loss. Higher values lead to ‘harder’ negative samples.
tau (float) – Tau parameter for the dcl and hard_dcl contrastive losses, indicating positive class probability.
output_path (str) – Path to save the trained model and all log files.
pretrained (str) – Whether to load a pretrained model. If False, model is trained from scratch. If not, must be the path to a saved model.
save_checkpoints (bool) – Whether to save checkpoints of the model during training. Defaults to False.
save_weights (bool) – Whether to save the weights of the model during training. Defaults to True.
input_type (str) – Type of the preprocessed_object passed as the first parameter. See deepof.data.TableDict for more details.
run (int) – Run number for the model. Used to save the model and log files. Optional.
kl_annealing_mode (str) – Mode of the KL annealing. Must be one of “linear”, or “sigmoid”.
kl_warmup (int) – Number of epochs to warm up the KL annealing.
reg_cat_clusters (bool) – whether to penalize uneven cluster membership in the latent space, by minimizing the KL divergence between cluster membership and a uniform categorical distribution.
recluster (bool) – whether to recluster after training using a Gaussian Mixture Model. Only valid for VaDE.
interaction_regularization (float) – weight of the interaction regularization term for all encoders.
bootstrap_training (bool) – If true, will train by sampling from data with replacement for stability estimation. False per default.
bootstrap_block_len (int) – Minimum number of samples that stay in the same block during boots_trapping to reduce effect of window overlap. Will be rounded up to a multiple of the batch size.
random_seed (int) – Random seed to be used for mainly data loader shuffling
**kwargs – Additional keyword arguments to pass to the model.

Returns:

Tuple containing all trained models. See specific model documentation under deepof.clustering.training for details.

Return type:

Tuple

class deepof.data.TableDict(tabs: Dict, typ: str, table_path: str | None = None, arena: str | None = None, arena_dims: array | None = None, animal_ids: List = ('',), center: str | None = None, connectivity: Graph | None = None, polar: bool | None = None, exp_conditions: dict | None = None, shapes: Dict = {})

Bases: dict

Main class for storing a single dataset as a dictionary with individuals as keys and pandas.DataFrames as values.

Includes methods for generating training and testing datasets for the supervised and unsupervised models.

__init__(tabs: Dict, typ: str, table_path: str | None = None, arena: str | None = None, arena_dims: array | None = None, animal_ids: List = ('',), center: str | None = None, connectivity: Graph | None = None, polar: bool | None = None, exp_conditions: dict | None = None, shapes: Dict = {})

Store single datasets as dictionaries with individuals as keys and pandas.DataFrames as values.

Includes methods for generating training and testing datasets for the autoencoders.

Parameters:

tabs (Dict) – Dictionary of pandas.DataFrames with individual experiments as keys.
typ (str) – Type of the dataset. Examples are “coords”, “dists”, and “angles”. For logging purposes only.
table_path (str) – Path to the root directory that is going to be used to save table iterations.
arena (str) – Type of the arena. Must be one of “circular-autodetect”, “circular-manual”, or “polygon-manual”. Handled internally.
arena_dims (np.array) – Dimensions of the arena in mm.
animal_ids (list) – list of animal ids.
center (str) – Type of the center. Handled internally.
connectivity (nx.Graph) – Bodypart graph of a mouse.
polar (bool) – Whether the dataset is in polar coordinates. Handled internally.
exp_conditions (dict) – dictionary with experiment IDs as keys and experimental conditions as values.
start_markers (dict) – dictionary with experiment IDs as keys and start markers as values.
shapes (Dict) – Dictionary containing the shapes of all stored tables

filter_videos(keys: list) → deepof_table_dict

Return a subset of the original table_dict object, containing only the specified keys.

Useful, for example, to select data coming from videos of a specified condition.

Parameters:: keys (list) – List of keys to keep.
Returns:: Subset of the original table_dict object, containing only the specified keys.
Return type:: TableDict

filter_condition(exp_filters: dict) → deepof_table_dict

Return a subset of the original table_dict object, containing only videos belonging to the specified experimental condition.

Parameters:: exp_filters (dict) – experimental conditions and values to filter on.
Returns:: Subset of the original table_dict object, containing only the specified keys.
Return type:: TableDict

filter_id(selected_id: str | None = None) → deepof_table_dict

Filter a TableDict object to keep only those columns related to the selected id.

Leave labels untouched if present.

Parameters:: selected_id (str) – select a single animal on multi animal settings. Defaults to None (all animals are processed).
Returns:: Filtered TableDict object, keeping only the selected animal.
Return type:: table_dict

new_dict_same_header(tabs: dict | None = None, only_keys: bool = False)

Creates a new table dict based on a given dictionary and the existing header information.

Parameters:

tabs (dict) – Dictionary of table entries
only_keys (bool) – Copy dictionary keys and create empty dictionary with same keys

Returns:

New TableDict object, based on given tabs and existing header info.

Return type:

table_dict

random_projection(n_components: int = 2, kernel: str = 'linear') → Tuple[Any, Any]

Return a training set generated from the 2D original data (time x features) and a random projection to a n_components space.

The sample parameter allows the user to randomly pick a subset of the data for performance or visualization reasons.

Parameters:

n_components (int) – Number of components to project to. Default is 2.
kernel (str) – Kernel to be used for projections. Defaults to linear.

Returns:

Tuple containing projected data and projection type.

Return type:

tuple

pca(n_components: int = 2, kernel: str = 'linear') → Tuple[Any, Any]

Return a training set generated from the 2D original data (time x features) and a PCA projection to a n_components space.

The sample parameter allows the user to randomly pick a subset of the data for performance or visualization reasons.

Parameters:

n_components (int) – Number of components to project to. Default is 2.
kernel (str) – Kernel to be used for projections. Defaults to linear.

Returns:

Tuple containing projected data and projection type.

Return type:

tuple

umap(n_components: int = 2) → Tuple[Any, Any]

Return a training set generated from the 2D original data (time x features) and a PCA projection to a n_components space.

The sample parameter allows the user to randomly pick a subset of the data for performance or visualization reasons.

Parameters:: n_components (int) – Number of components to project to. Default is 2.
Returns:: Tuple containing projected data and projection type.
Return type:: tuple

merge(*args, ignore_index=False, file_name='merged', save_as_paths=False)

Take a number of table_dict objects and merges them to the current one.

Returns a table_dict object of type ‘merged’. Only annotations of the first table_dict object are kept.

Parameters:

*args (table_dict) – table_dict objects to be merged.
ignore_index (bool) – ignore index when merging. Defaults to False.
file_name (str) – Name that is used for saving the merged table
save_as_paths (bool) – If True, Saves merged datasets as paths to file locations instead of keeping tables in RAM

Returns:

Merged table_dict object.

Return type:

table_dict

get_training_set(current_table_dict: deepof_table_dict, test_videos: int | list = 0) → tuple

Generate training and test sets as table_dicts for model training.

Intended for internal usage only.

Parameters:

current_table_dict (table_dict) – table_dict object containing the data to be used for training.
test_videos (Union[int, list]) – Number of videos to be used for testing or keys of test videos. Defaults to 0.

Returns:

X_train (table_dict): only training data ELSE: tuple: Tuple containing training data, test data (as table_dicts), and test keys (if any).

Return type:

IF there are no test videos

preprocess(coordinates, window_size: int | None = None, window_step: int = 1, bin_size=None, bin_index=None, precomputed_bins=None, samples_max: int = 227272, scale: str = 'standard', pretrained_scaler=None, test_videos: int = 0, interpolate_normalized: int = 10, filter_low_variance: bool = False, file_name: str = 'preprocessed', save_as_paths: bool | None = None, shuffle: bool = False, quality_to_load=None, dist_standardize: str = 'groupwise', speed_standardize: str = 'groupwise', coord_standardize: str = 'groupwise', log_distances: bool = True) → tuple

Preprocess pose tables for model training (refactor of preprocess).

Pipeline: 1. Filter by time bins, drop all-NaN tables 2. Optionally replace speeds with quality scores 3. Collect samples to fit global scalers (size-normalized but not standardized) 4. Apply full scaling (size + statistical) and save 5. Extract sliding windows for training

sample_windows_from_data(time_bin_info: Dict[str, ndarray] | None = None, N_windows_tab: int = 10000, return_edges: bool = False, no_nans: bool = False) → Tuple[ndarray, Dict] | Tuple[ndarray, ndarray, Dict]

Samples a set of windows from data entries, enhancing readability and reducing complexity.

Parameters:

time_bin_info (dict, optional) – Pre-defined indices to sample for each key. If provided, sampling logic is bypassed. Defaults to None.
N_windows_tab (int) – Max number of windows to sample from each recording if time_bin_info is not given.
return_edges (bool) – If True, returns a second dataset for edges.
no_nans (bool) – If True and time_bin_info is not given, only samples from rows without NaNs. Note: This may result in non-contiguous original indices.

Returns:

The concatenated main dataset (X_data). - np.array: The concatenated edge dataset (a_data), if return_edges is True. - dict: A dictionary with the sampled indices for each key (time_bin_info).

Return type:

np.array

deepof.post_hoc module

Data structures and functions for analyzing supervised and unsupervised model results.

deepof.post_hoc.get_contrastive_soft_counts(coordinates, embeddings: Dict[str, ndarray], states: str | int = 'bic', min_states: int = 2, max_states: int = 25, reg_covar: float = 1e-05, sample_size: int = 500000, random_state: int = 0, p_stay: float = 0.95, soft_counts: Dict[str, ndarray] | None = None, min_confidence: float | None = 0.75, prior_weight: float = 1.0)

Extract soft counts for contrastive model.

If soft_counts is provided, it is used as a per-frame prior over states (clusters), biasing the forward–backward posteriors (HMM smoothing) without running EM training.

Notes

If soft_counts is provided, K is taken from its second dimension (and AIC/BIC search is skipped).
Priors are applied as: log_emiss += prior_weight * log(soft_counts).
If min_confidence is not None, frames with max prior <= min_confidence are replaced by uniform priors.

deepof.post_hoc.get_supervised_chaos(coordinates, quality_threshold: float = 0.75, frac_bps_below: float = 0.5, chaos_suffix: str = 'chaos')

Create a supervised-annotations-like table dict containing only quality-based chaos labels.

Parameters:

coordinates (coordinates) – deepof.Coordinates object for the project at hand.
quality_threshold (float) – Per-bodypart quality threshold below which a bodypart is counted as low quality.
frac_bps_below (float) – Fraction of bodyparts that need to fall below quality_threshold for a frame to be flagged as chaotic for a given animal.
chaos_suffix (str) – Suffix used for per-animal chaos columns. Resulting columns are of the form "{animal_id}_{chaos_suffix}".

Returns:

table dict with one table per experiment containing: per-animal chaos columns and one additional "any_chaos" column.

Return type:

supervised_chaos (table_dict)

deepof.post_hoc.add_chaos_gates(coordinates, soft_counts_dict, soft_counts_chaos_dict, supervised_chaos, window_size: int)

Combine regular and chaos-specific soft counts gate-wise.

Parameters:

soft_counts_dict (dict) – Dictionary mapping gate -> TableDict with regular soft counts.
soft_counts_chaos_dict (dict) – Dictionary mapping gate -> TableDict with chaos soft counts. Typically this contains a single gate generated from embedding_gates="any_chaos".
supervised_annotations (table_dict) – Table dict with frame-wise chaos annotations. Expected columns are "{animal_id}_chaos" and "any_chaos".
extract_pair (list) – Tuple of animal ids to extract.
window_len (int) – Window size used to produce the soft counts from frame-wise annotations.

Returns:

Dictionary mapping gate -> TableDict with regular and chaos-specific: soft counts concatenated along the feature axis.

Return type:

soft_counts_out (dict)

deepof.post_hoc.compute_gate_edges(coordinates, animal_ids: list, *, keys: list | None = None, window_size: int = 12, supervised_annotations=None, M_gates: int = 3, embedding_gates: Any = 'Center', fixed_edges: list | None = None) → Dict[Any, ndarray] | None

Precompute bin edges for distance-gated extraction.

Behavior is intentionally identical to the original in-function logic: - supervised gating -> return None - fixed_edges provided -> validate and use them - otherwise -> compute quantile edges from the full gating series

deepof.post_hoc.get_pairwise_distances(coordinates, window_len: int, supervised_annotations=None, embedding_gates: Any = 'Nose', behavior_combinations: bool = True) → Dict[str, Dict]

Per-window gating series: pairwise distances OR behavior-combination codes.

Fixes vs original:

deterministic behavior ordering (sorted, not set)
guards against all-NaN distance columns
reports which behaviors were dropped
validates bodypart existence in distance mode

deepof.post_hoc.get_contrastive_soft_counts_gmm(coordinates, embeddings: Dict[str, ndarray], animal_ids: list, window_size: int = 12, supervised_annotations=None, N_clusters_per_gate: int = 8, M_gates: int = 3, gate_edges: Dict[Any, ndarray] | None = None, reg_covar: float = 1e-05, sample_size: int = 200000, random_state: int = 0, embedding_gates: Any = 'Center', temporal_smooth_win: int | None = 3)

Distance/behavior-gated GMM decoder.

Returns:: one soft-count TableDict per gate. For pairwise distance gating, keys are animal pairs like (“A”, “B”).
Return type:: Dict[Any, TableDict]

deepof.post_hoc.get_contrastive_soft_counts_msm_pcca(coordinates, embeddings: Dict[str, ndarray], animal_ids: list, window_size: int = 12, supervised_annotations=None, N_clusters_per_gate: int = 10, M_gates: int = 3, gate_edges: Dict[Any, ndarray] | None = None, sample_size: int = 200000, random_state: int = 0, embedding_gates: Any = 'Center', temporal_smooth_win: int | None = 3, n_micro: int = 400, min_micro_per_macro: int = 3, lagtime: int = 3)

Distance/behavior-gated MSM + PCCA with k-means microstates.

Returns:: one soft-count TableDict per gate. For pairwise distance gating, keys are animal pairs like (“A”, “B”).
Return type:: Dict[Any, TableDict]

deepof.post_hoc.recluster(coordinates: deepof_coordinates, embeddings: deepof_table_dict, soft_counts: deepof_table_dict | None = None, min_confidence: float = 0.75, states: str | int = 'aic', pretrained: bool | str = False, covariance_type: str = 'diag', min_states: int = 2, max_states: int = 12, save: bool = True)

Recluster the data using a HMM-based approach. If soft_counts is provided, the model will use the soft cluster assignments as priors for a semi-supervised HMM.

Parameters:

coordinates – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
min_confidence (float) – minimum confidence the model should assign to a data point for the model to avoid resorting to a uniform prior around it.
states – Number of states to use for the HMM. If “aic” or “bic”, the number of states is chosen by minimizing the AIC or BIC criteria (respectively) over a predefined range of states.
pretrained – Whether to use a pretrained model or not. If True, DeepOF will search for an existing file with the provided parameters. If a string, DeepOF will search for a file with the provided name.
covariance_type – Type of covariance matrix to use for the HMM. Can be either “full”, “diag”, or “sphere”.
min_states – Minimum number of states to use for the HMM if automatic search is enabled.
max_states – Maximum number of states to use for the HMM if automatic search is enabled.
exclude_keys (list) – list of keys to exclude
save – Whether to save the trained model or not.

Returns:

table dict with soft cluster assignments per animal experiment across time, using the new HMM-based segmentation on the embedding space.

Return type:

soft_counts (table_dict)

deepof.post_hoc.get_time_on_cluster(soft_counts: deepof_table_dict, normalize: bool = True, reduce_dim: bool = False, bin_info: dict | ndarray | None = None, roi_number: int | None = None, animals_in_roi: list | None = None)

Compute how much each animal spends on each cluster.

Requires a set of cluster assignments.

Parameters:

soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
normalize (bool) – Whether to normalize the time by the total number of frames in each condition.
reduce_dim (bool) – Whether to reduce the dimensionality of the embeddings to 2D. If False, the embeddings are kept in their original dimensionality.
bin_info (Union[dict,np.ndarray]) – A dictionary or single array containing start and end positions of all sections for given embeddings and ROIs
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded

Returns:

A dataframe with the time spent on each cluster for each experiment.

deepof.post_hoc.condition_distance_binning(embedding: deepof_table_dict, soft_counts: deepof_table_dict, exp_conditions: dict, start_bin: int | None = None, end_bin: int | None = None, step_bin: int | None = None, scan_mode: str = 'growing_window', precomputed_bins: ndarray | None = None, agg: str = 'mean', metric: str = 'auc', n_jobs: int = 2)

Compute the distance between the embeddings of two conditions, using the specified aggregation method.

Parameters:

embedding (TableDict) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
start_bin (int) – The index of the first bin to compute the distance for.
end_bin (int) – The index of the last bin to compute the distance for.
step_bin (int) – The step size of the bins to compute the distance for.
scan_mode (str) – The mode to use for computing the distance. Can be one of “growing-window” (used to select optimal binning), “per-bin” (used to evaluate how discriminability evolves in subsequent bins of a specified size) or “precomputed”, which requires a numpy ndarray with bin IDs to be passed to precomputed_bins.
precomputed_bins (np.ndarray) – numpy array with integer bin sizes in frames, do not necessarily need to have the same size. Difference across conditions for each of these bins will be reported.
agg (str) – The aggregation method to use. Can be either “mean”, “median”, or “time_on_cluster”.
metric (str) – The distance metric to use. Can be either “auc” (where the reported ‘distance’ is based on performance of a classifier when separating aggregated embeddings), or “wasserstein” (which computes distances based on optimal transport).
n_jobs (int) – The number of jobs to use for parallel processing.

Returns:

An array with distances between conditions across the resulting time bins

deepof.post_hoc.separation_between_conditions(cur_embedding: deepof_table_dict, cur_soft_counts: deepof_table_dict, bin_info: dict | ndarray, exp_conditions: dict, agg: str, metric: str)

Compute the distance between the embeddings of two conditions, using the specified aggregation method.

Parameters:

cur_embedding (TableDict) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
cur_soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
bin_info (Union[dict,np.ndarray]) – A dictionary or single array containing start and end positions or indices of all sections for given embeddings
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
agg (str) – The aggregation method to use. Can be one of “time on cluster”, “mean”, or “median”.
metric (str) – The distance metric to use. Can be either “auc” (where the reported ‘distance’ is based on performance of a classifier when separating aggregated embeddings), or “wasserstein” (which computes distances based on optimal transport).

Returns:

The distance between the embeddings of the two conditions.

deepof.post_hoc.fit_normative_global_model(global_normal_embeddings: DataFrame)

Fit a global model to the normal embeddings.

Parameters:: global_normal_embeddings (pd.DataFrame) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
Returns:: A fitted global model.

deepof.post_hoc.enrichment_across_conditions(soft_counts: deepof_table_dict | None = None, supervised_annotations: deepof_table_dict | None = None, exp_conditions: dict | None = None, plot_speed: bool = False, bin_info: dict | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', normalize: bool = False, custom_continuous_behavior_names: list = [])

Compute the population of each cluster across conditions.

Parameters:

soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
supervised_annotations (tableDict) – table dict with supervised annotations per animal experiment across time.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
plot_speed (bool) – plot “speed” behavior
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings and ROIs
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
normalize (bool) – Whether to normalize the population of each cluster across conditions.
custom_continuous_behavior_names (list) – list of potentially added names of custom continuous behaviors (should get sorted out)

Returns:

A long format dataframe with the population of each cluster across conditions.

deepof.post_hoc.get_transitions(state_sequence: list, n_states: int, index_sequence: list | None = None)

Compute the transitions between states in a state sequence.

Parameters:

state_sequence (list) – A list of states.
n_states (int) – The number of states.
index_sequence (list) – An optional list of index positions for the states. Will ensure that state transitions between non-neighboring sequence entries are skipped

Returns:

The resulting transition matrix.

deepof.post_hoc.compute_transition_matrix_per_condition(soft_counts: deepof_table_dict, exp_conditions: dict, silence_diagonal: bool = False, bin_info: dict | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, aggregate: str = True, normalize: str = True)

Compute the transition matrices specific to each condition.

Parameters:

soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding
silence_diagonal (bool) – If True, diagonal elements on the transition matrix are set to zero.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings and ROI information
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
aggregate (str) – Whether to aggregate the embeddings across time.
normalize (str) – Whether to normalize the population of each cluster across conditions.

Returns:

A dictionary of transition matrices, where the keys are the names of the experimental conditions, and the values are the transition matrices for each condition.

deepof.post_hoc.compute_steady_state(transition_matrices: dict, return_entropy: bool = False, n_iters: int = 100000)

Compute the steady state of each transition matrix provided in a dictionary.

Parameters:

transition_matrices (dict) – A dictionary of transition matrices, where the keys are the names of the experimental conditions, and the values are the transition matrices for each condition.
return_entropy (bool) – Whether to return the entropy of the steady state. If False, the steady states themselves are returned.
n_iters (int) – The number of iterations to use for the Markov chain.

Returns:

A dictionary of steady states, where the keys are the names of the experimental conditions, and the values are the steady states for each condition. If return_entropy is True, values correspond to the entropy of each steady state.

deepof.post_hoc.compute_UMAP(embeddings, cluster_assignments, random_state=0): Compute UMAP embeddings for visualization purposes.

deepof.post_hoc.align_deepof_kinematics_with_unsupervised_labels(deepof_project: deepof_coordinates, kin_derivative: int = 1, center: str = 'Center', align: str = 'Spine_1', include_feature_derivatives: bool = False, include_distances: bool = True, include_angles: bool = True, include_areas: bool = True, animal_id: str | None = None, file_name: str = 'kinematics', return_path: bool = False)

Align kinematics with unsupervised labels.

In order to annotate time chunks with as many relevant features as possible, this function aligns the kinematics of a deepof project (speed and acceleration of body parts, distances, and angles) with the hard cluster assignments obtained from the unsupervised pipeline.

Parameters:

deepof_project (coordinates) – A deepof.Project object.
kin_derivative (int) – The order of the derivative to use for the kinematics. 1 = speed, 2 = acceleration, etc.
center (str) – Body part to center coordinates on. “Center” by default.
align (str) – Body part to rotationally align the body parts with. “Spine_1” by default.
include_feature_derivatives (bool) – Whether to compute speed on distances, angles, and areas, if they are included.
include_distances (bool) – Whether to include distances in the alignment.
include_angles (bool) – Whether to include angles in the alignment.
include_areas (bool) – Whether to include areas in the alignment.
animal_id (str) – The animal ID to use, in case of multi-animal projects.
file_name (str) – Name of table for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.

Returns:

A dictionary of aligned kinematics, where the keys are the names of the experimental conditions, and the values are the aligned kinematics for each condition.

deepof.post_hoc.chunk_summary_statistics(chunked_dataset: ndarray, body_part_names: list)

Extract summary statistics from a chunked dataset using seglearn.

Parameters:

chunked_dataset (np.ndarray) – Preprocessed training set (of shape chunks x time x features), where each entry corresponds to a time chunk of data.
body_part_names (list) – A list of the names of the body parts.

Returns:

A dataframe of kinematic features, of shape chunks by features.

deepof.post_hoc.annotate_time_chunks(deepof_project: deepof_coordinates, soft_counts: deepof_table_dict, supervised_annotations: deepof_table_dict | None = None, window_size: int | None = None, window_step: int = 1, animal_id: str | None = None, samples: int = 10000, min_confidence: float = 0.0, kin_derivative: int = 1, include_distances: bool = True, include_angles: bool = True, include_areas: bool = True, aggregate: str = 'mean')

Annotate time chunks produced after change-point detection using the unsupervised pipeline.

Uses a set of summary statistics coming from kinematics, distances, angles, and supervised labels when provided.

Parameters:

deepof_project (coordinates) – Project object.
soft_counts (table_dict) – matrix with soft cluster assignments produced by the unsupervised pipeline.
supervised_annotations (table_dict) – set of supervised annotations produced by the supervised pipeline withing deepof.
window_size (int) – Minimum size of the applied ruptures. If automatic_changepoints is False, specifies the size of the sliding window to pass through the data to generate training instances. None defaults to video frame-rate.
window_step (int) – Specifies the minimum jump for the rupture algorithms. If automatic_changepoints is False, specifies the step to take when sliding the aforementioned window. In this case, a value of 1 indicates a true sliding window, and a value equal to window_size splits the data into non-overlapping chunks.
animal_id (str) – The animal ID to use, in case of multi-animal projects.
samples (int) – Time chunks samples to take to reduce computational time. Defaults to the minimum between 10000 and the number of available chunks.
min_confidence (float) – minimum confidence in cluster assignments used for quality control filtering.
kin_derivative (int) – The order of the derivative to use for the kinematics. 1 = speed, 2 = acceleration, etc.
include_distances (bool) – Whether to include distances in the alignment. kin_derivative is taken into account.
include_angles (bool) – Whether to include angles in the alignment. kin_derivative is taken into account.
include_areas (bool) – Whether to include areas in the alignment. kin_derivative is taken into account.
aggregate (str) – aggregation mode. Can be either “mean” (computationally cheapest), just use the average per feature, or “seglearn” which runs a thorough feature extraction and selection pipeline on each time series.

Returns:

A dataframe of kinematic features, of shape chunks by features.

deepof.post_hoc.chunk_cv_splitter(chunk_stats: DataFrame, bin_info: dict, n_folds: int | None = None)

Split a dataset into training and testing sets, grouped by video.

Given a matrix with extracted features per chunk, returns a list containing a set of cross-validation folds, grouped by experimental video. This makes sure that chunks coming from the same experiment will never be leaked between training and testing sets.

Parameters:

chunk_stats (pd.DataFrame) – matrix with statistics per chunk, sorted by experiment.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings
n_folds (int) – number of cross-validation folds to compute.

Returns:

list containing a training and testing set per CV fold.

deepof.post_hoc.train_supervised_cluster_detectors(chunk_stats: DataFrame, hard_counts: ndarray, bin_info: dict, n_folds: int | None = None, verbose: int = 1)

Train supervised models to detect clusters from kinematic features.

Parameters:

chunk_stats (pd.DataFrame) – table with descriptive statistics for a series of sequences (‘chunks’).
hard_counts (np.ndarray) – cluster assignments for the corresponding ‘chunk_stats’ table.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings
n_folds (int) – number of folds for cross validation. If None (default) leave-one-experiment-out CV is used.
verbose (int) – verbosity level. Must be an integer between 0 (nothing printed) and 3 (all is printed).

Returns:

trained supervised model on the full dataset, mapping chunk stats to cluster assignments. Useful to run the SHAP explainability pipeline. cluster_gbm_performance (dict): cross-validated dictionary containing trained estimators and performance metrics. groups (list): cross-validation indices. Data from the same animal are never shared between train and test sets.

Return type:

full_cluster_clf (imblearn.pipeline.Pipeline)

deepof.post_hoc module

Data structures and functions for analyzing supervised and unsupervised model results.

deepof.post_hoc.get_contrastive_soft_counts(coordinates, embeddings: Dict[str, ndarray], states: str | int = 'bic', min_states: int = 2, max_states: int = 25, reg_covar: float = 1e-05, sample_size: int = 500000, random_state: int = 0, p_stay: float = 0.95, soft_counts: Dict[str, ndarray] | None = None, min_confidence: float | None = 0.75, prior_weight: float = 1.0)

Extract soft counts for contrastive model.

If soft_counts is provided, it is used as a per-frame prior over states (clusters), biasing the forward–backward posteriors (HMM smoothing) without running EM training.

Notes

If soft_counts is provided, K is taken from its second dimension (and AIC/BIC search is skipped).
Priors are applied as: log_emiss += prior_weight * log(soft_counts).
If min_confidence is not None, frames with max prior <= min_confidence are replaced by uniform priors.

deepof.post_hoc.get_supervised_chaos(coordinates, quality_threshold: float = 0.75, frac_bps_below: float = 0.5, chaos_suffix: str = 'chaos')

Create a supervised-annotations-like table dict containing only quality-based chaos labels.

Parameters:

coordinates (coordinates) – deepof.Coordinates object for the project at hand.
quality_threshold (float) – Per-bodypart quality threshold below which a bodypart is counted as low quality.
frac_bps_below (float) – Fraction of bodyparts that need to fall below quality_threshold for a frame to be flagged as chaotic for a given animal.
chaos_suffix (str) – Suffix used for per-animal chaos columns. Resulting columns are of the form "{animal_id}_{chaos_suffix}".

Returns:

table dict with one table per experiment containing: per-animal chaos columns and one additional "any_chaos" column.

Return type:

supervised_chaos (table_dict)

deepof.post_hoc.add_chaos_gates(coordinates, soft_counts_dict, soft_counts_chaos_dict, supervised_chaos, window_size: int)

Combine regular and chaos-specific soft counts gate-wise.

Parameters:

soft_counts_dict (dict) – Dictionary mapping gate -> TableDict with regular soft counts.
soft_counts_chaos_dict (dict) – Dictionary mapping gate -> TableDict with chaos soft counts. Typically this contains a single gate generated from embedding_gates="any_chaos".
supervised_annotations (table_dict) – Table dict with frame-wise chaos annotations. Expected columns are "{animal_id}_chaos" and "any_chaos".
extract_pair (list) – Tuple of animal ids to extract.
window_len (int) – Window size used to produce the soft counts from frame-wise annotations.

Returns:

Dictionary mapping gate -> TableDict with regular and chaos-specific: soft counts concatenated along the feature axis.

Return type:

soft_counts_out (dict)

deepof.post_hoc.compute_gate_edges(coordinates, animal_ids: list, *, keys: list | None = None, window_size: int = 12, supervised_annotations=None, M_gates: int = 3, embedding_gates: Any = 'Center', fixed_edges: list | None = None) → Dict[Any, ndarray] | None

Precompute bin edges for distance-gated extraction.

Behavior is intentionally identical to the original in-function logic: - supervised gating -> return None - fixed_edges provided -> validate and use them - otherwise -> compute quantile edges from the full gating series

deepof.post_hoc.get_pairwise_distances(coordinates, window_len: int, supervised_annotations=None, embedding_gates: Any = 'Nose', behavior_combinations: bool = True) → Dict[str, Dict]

Per-window gating series: pairwise distances OR behavior-combination codes.

Fixes vs original:

deterministic behavior ordering (sorted, not set)
guards against all-NaN distance columns
reports which behaviors were dropped
validates bodypart existence in distance mode

deepof.post_hoc.get_contrastive_soft_counts_gmm(coordinates, embeddings: Dict[str, ndarray], animal_ids: list, window_size: int = 12, supervised_annotations=None, N_clusters_per_gate: int = 8, M_gates: int = 3, gate_edges: Dict[Any, ndarray] | None = None, reg_covar: float = 1e-05, sample_size: int = 200000, random_state: int = 0, embedding_gates: Any = 'Center', temporal_smooth_win: int | None = 3)

Distance/behavior-gated GMM decoder.

Returns:: one soft-count TableDict per gate. For pairwise distance gating, keys are animal pairs like (“A”, “B”).
Return type:: Dict[Any, TableDict]

deepof.post_hoc.get_contrastive_soft_counts_msm_pcca(coordinates, embeddings: Dict[str, ndarray], animal_ids: list, window_size: int = 12, supervised_annotations=None, N_clusters_per_gate: int = 10, M_gates: int = 3, gate_edges: Dict[Any, ndarray] | None = None, sample_size: int = 200000, random_state: int = 0, embedding_gates: Any = 'Center', temporal_smooth_win: int | None = 3, n_micro: int = 400, min_micro_per_macro: int = 3, lagtime: int = 3)

Distance/behavior-gated MSM + PCCA with k-means microstates.

Returns:: one soft-count TableDict per gate. For pairwise distance gating, keys are animal pairs like (“A”, “B”).
Return type:: Dict[Any, TableDict]

deepof.post_hoc.recluster(coordinates: deepof_coordinates, embeddings: deepof_table_dict, soft_counts: deepof_table_dict | None = None, min_confidence: float = 0.75, states: str | int = 'aic', pretrained: bool | str = False, covariance_type: str = 'diag', min_states: int = 2, max_states: int = 12, save: bool = True)

Recluster the data using a HMM-based approach. If soft_counts is provided, the model will use the soft cluster assignments as priors for a semi-supervised HMM.

Parameters:

coordinates – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
min_confidence (float) – minimum confidence the model should assign to a data point for the model to avoid resorting to a uniform prior around it.
states – Number of states to use for the HMM. If “aic” or “bic”, the number of states is chosen by minimizing the AIC or BIC criteria (respectively) over a predefined range of states.
pretrained – Whether to use a pretrained model or not. If True, DeepOF will search for an existing file with the provided parameters. If a string, DeepOF will search for a file with the provided name.
covariance_type – Type of covariance matrix to use for the HMM. Can be either “full”, “diag”, or “sphere”.
min_states – Minimum number of states to use for the HMM if automatic search is enabled.
max_states – Maximum number of states to use for the HMM if automatic search is enabled.
exclude_keys (list) – list of keys to exclude
save – Whether to save the trained model or not.

Returns:

table dict with soft cluster assignments per animal experiment across time, using the new HMM-based segmentation on the embedding space.

Return type:

soft_counts (table_dict)

deepof.post_hoc.get_time_on_cluster(soft_counts: deepof_table_dict, normalize: bool = True, reduce_dim: bool = False, bin_info: dict | ndarray | None = None, roi_number: int | None = None, animals_in_roi: list | None = None)

Compute how much each animal spends on each cluster.

Requires a set of cluster assignments.

Parameters:

soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
normalize (bool) – Whether to normalize the time by the total number of frames in each condition.
reduce_dim (bool) – Whether to reduce the dimensionality of the embeddings to 2D. If False, the embeddings are kept in their original dimensionality.
bin_info (Union[dict,np.ndarray]) – A dictionary or single array containing start and end positions of all sections for given embeddings and ROIs
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded

Returns:

A dataframe with the time spent on each cluster for each experiment.

deepof.post_hoc.condition_distance_binning(embedding: deepof_table_dict, soft_counts: deepof_table_dict, exp_conditions: dict, start_bin: int | None = None, end_bin: int | None = None, step_bin: int | None = None, scan_mode: str = 'growing_window', precomputed_bins: ndarray | None = None, agg: str = 'mean', metric: str = 'auc', n_jobs: int = 2)

Compute the distance between the embeddings of two conditions, using the specified aggregation method.

Parameters:

embedding (TableDict) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
start_bin (int) – The index of the first bin to compute the distance for.
end_bin (int) – The index of the last bin to compute the distance for.
step_bin (int) – The step size of the bins to compute the distance for.
scan_mode (str) – The mode to use for computing the distance. Can be one of “growing-window” (used to select optimal binning), “per-bin” (used to evaluate how discriminability evolves in subsequent bins of a specified size) or “precomputed”, which requires a numpy ndarray with bin IDs to be passed to precomputed_bins.
precomputed_bins (np.ndarray) – numpy array with integer bin sizes in frames, do not necessarily need to have the same size. Difference across conditions for each of these bins will be reported.
agg (str) – The aggregation method to use. Can be either “mean”, “median”, or “time_on_cluster”.
metric (str) – The distance metric to use. Can be either “auc” (where the reported ‘distance’ is based on performance of a classifier when separating aggregated embeddings), or “wasserstein” (which computes distances based on optimal transport).
n_jobs (int) – The number of jobs to use for parallel processing.

Returns:

An array with distances between conditions across the resulting time bins

deepof.post_hoc.separation_between_conditions(cur_embedding: deepof_table_dict, cur_soft_counts: deepof_table_dict, bin_info: dict | ndarray, exp_conditions: dict, agg: str, metric: str)

Compute the distance between the embeddings of two conditions, using the specified aggregation method.

Parameters:

cur_embedding (TableDict) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
cur_soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
bin_info (Union[dict,np.ndarray]) – A dictionary or single array containing start and end positions or indices of all sections for given embeddings
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
agg (str) – The aggregation method to use. Can be one of “time on cluster”, “mean”, or “median”.
metric (str) – The distance metric to use. Can be either “auc” (where the reported ‘distance’ is based on performance of a classifier when separating aggregated embeddings), or “wasserstein” (which computes distances based on optimal transport).

Returns:

The distance between the embeddings of the two conditions.

deepof.post_hoc.fit_normative_global_model(global_normal_embeddings: DataFrame)

Fit a global model to the normal embeddings.

Parameters:: global_normal_embeddings (pd.DataFrame) – A dictionary of embeddings, where the keys are the names of the experimental conditions, and the values are the embeddings for each condition.
Returns:: A fitted global model.

deepof.post_hoc.enrichment_across_conditions(soft_counts: deepof_table_dict | None = None, supervised_annotations: deepof_table_dict | None = None, exp_conditions: dict | None = None, plot_speed: bool = False, bin_info: dict | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', normalize: bool = False, custom_continuous_behavior_names: list = [])

Compute the population of each cluster across conditions.

Parameters:

soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
supervised_annotations (tableDict) – table dict with supervised annotations per animal experiment across time.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding experimental conditions.
plot_speed (bool) – plot “speed” behavior
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings and ROIs
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
normalize (bool) – Whether to normalize the population of each cluster across conditions.
custom_continuous_behavior_names (list) – list of potentially added names of custom continuous behaviors (should get sorted out)

Returns:

A long format dataframe with the population of each cluster across conditions.

deepof.post_hoc.get_transitions(state_sequence: list, n_states: int, index_sequence: list | None = None)

Compute the transitions between states in a state sequence.

Parameters:

state_sequence (list) – A list of states.
n_states (int) – The number of states.
index_sequence (list) – An optional list of index positions for the states. Will ensure that state transitions between non-neighboring sequence entries are skipped

Returns:

The resulting transition matrix.

deepof.post_hoc.compute_transition_matrix_per_condition(soft_counts: deepof_table_dict, exp_conditions: dict, silence_diagonal: bool = False, bin_info: dict | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, aggregate: str = True, normalize: str = True)

Compute the transition matrices specific to each condition.

Parameters:

soft_counts (TableDict) – A dictionary of soft counts, where the keys are the names of the experimental conditions, and the values are the soft counts for each condition.
exp_conditions (dict) – A dictionary of experimental conditions, where the keys are the names of the experiments, and the values are the names of their corresponding
silence_diagonal (bool) – If True, diagonal elements on the transition matrix are set to zero.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings and ROI information
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
aggregate (str) – Whether to aggregate the embeddings across time.
normalize (str) – Whether to normalize the population of each cluster across conditions.

Returns:

A dictionary of transition matrices, where the keys are the names of the experimental conditions, and the values are the transition matrices for each condition.

deepof.post_hoc.compute_steady_state(transition_matrices: dict, return_entropy: bool = False, n_iters: int = 100000)

Compute the steady state of each transition matrix provided in a dictionary.

Parameters:

transition_matrices (dict) – A dictionary of transition matrices, where the keys are the names of the experimental conditions, and the values are the transition matrices for each condition.
return_entropy (bool) – Whether to return the entropy of the steady state. If False, the steady states themselves are returned.
n_iters (int) – The number of iterations to use for the Markov chain.

Returns:

A dictionary of steady states, where the keys are the names of the experimental conditions, and the values are the steady states for each condition. If return_entropy is True, values correspond to the entropy of each steady state.

deepof.post_hoc.compute_UMAP(embeddings, cluster_assignments, random_state=0): Compute UMAP embeddings for visualization purposes.

deepof.post_hoc.align_deepof_kinematics_with_unsupervised_labels(deepof_project: deepof_coordinates, kin_derivative: int = 1, center: str = 'Center', align: str = 'Spine_1', include_feature_derivatives: bool = False, include_distances: bool = True, include_angles: bool = True, include_areas: bool = True, animal_id: str | None = None, file_name: str = 'kinematics', return_path: bool = False)

Align kinematics with unsupervised labels.

In order to annotate time chunks with as many relevant features as possible, this function aligns the kinematics of a deepof project (speed and acceleration of body parts, distances, and angles) with the hard cluster assignments obtained from the unsupervised pipeline.

Parameters:

deepof_project (coordinates) – A deepof.Project object.
kin_derivative (int) – The order of the derivative to use for the kinematics. 1 = speed, 2 = acceleration, etc.
center (str) – Body part to center coordinates on. “Center” by default.
align (str) – Body part to rotationally align the body parts with. “Spine_1” by default.
include_feature_derivatives (bool) – Whether to compute speed on distances, angles, and areas, if they are included.
include_distances (bool) – Whether to include distances in the alignment.
include_angles (bool) – Whether to include angles in the alignment.
include_areas (bool) – Whether to include areas in the alignment.
animal_id (str) – The animal ID to use, in case of multi-animal projects.
file_name (str) – Name of table for saving
return_path (bool) – if True, Return only the path to the processed table, if false, return the full table.

Returns:

A dictionary of aligned kinematics, where the keys are the names of the experimental conditions, and the values are the aligned kinematics for each condition.

deepof.post_hoc.chunk_summary_statistics(chunked_dataset: ndarray, body_part_names: list)

Extract summary statistics from a chunked dataset using seglearn.

Parameters:

chunked_dataset (np.ndarray) – Preprocessed training set (of shape chunks x time x features), where each entry corresponds to a time chunk of data.
body_part_names (list) – A list of the names of the body parts.

Returns:

A dataframe of kinematic features, of shape chunks by features.

deepof.post_hoc.annotate_time_chunks(deepof_project: deepof_coordinates, soft_counts: deepof_table_dict, supervised_annotations: deepof_table_dict | None = None, window_size: int | None = None, window_step: int = 1, animal_id: str | None = None, samples: int = 10000, min_confidence: float = 0.0, kin_derivative: int = 1, include_distances: bool = True, include_angles: bool = True, include_areas: bool = True, aggregate: str = 'mean')

Annotate time chunks produced after change-point detection using the unsupervised pipeline.

Uses a set of summary statistics coming from kinematics, distances, angles, and supervised labels when provided.

Parameters:

deepof_project (coordinates) – Project object.
soft_counts (table_dict) – matrix with soft cluster assignments produced by the unsupervised pipeline.
supervised_annotations (table_dict) – set of supervised annotations produced by the supervised pipeline withing deepof.
window_size (int) – Minimum size of the applied ruptures. If automatic_changepoints is False, specifies the size of the sliding window to pass through the data to generate training instances. None defaults to video frame-rate.
window_step (int) – Specifies the minimum jump for the rupture algorithms. If automatic_changepoints is False, specifies the step to take when sliding the aforementioned window. In this case, a value of 1 indicates a true sliding window, and a value equal to window_size splits the data into non-overlapping chunks.
animal_id (str) – The animal ID to use, in case of multi-animal projects.
samples (int) – Time chunks samples to take to reduce computational time. Defaults to the minimum between 10000 and the number of available chunks.
min_confidence (float) – minimum confidence in cluster assignments used for quality control filtering.
kin_derivative (int) – The order of the derivative to use for the kinematics. 1 = speed, 2 = acceleration, etc.
include_distances (bool) – Whether to include distances in the alignment. kin_derivative is taken into account.
include_angles (bool) – Whether to include angles in the alignment. kin_derivative is taken into account.
include_areas (bool) – Whether to include areas in the alignment. kin_derivative is taken into account.
aggregate (str) – aggregation mode. Can be either “mean” (computationally cheapest), just use the average per feature, or “seglearn” which runs a thorough feature extraction and selection pipeline on each time series.

Returns:

A dataframe of kinematic features, of shape chunks by features.

deepof.post_hoc.chunk_cv_splitter(chunk_stats: DataFrame, bin_info: dict, n_folds: int | None = None)

Split a dataset into training and testing sets, grouped by video.

Given a matrix with extracted features per chunk, returns a list containing a set of cross-validation folds, grouped by experimental video. This makes sure that chunks coming from the same experiment will never be leaked between training and testing sets.

Parameters:

chunk_stats (pd.DataFrame) – matrix with statistics per chunk, sorted by experiment.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings
n_folds (int) – number of cross-validation folds to compute.

Returns:

list containing a training and testing set per CV fold.

deepof.post_hoc.train_supervised_cluster_detectors(chunk_stats: DataFrame, hard_counts: ndarray, bin_info: dict, n_folds: int | None = None, verbose: int = 1)

Train supervised models to detect clusters from kinematic features.

Parameters:

chunk_stats (pd.DataFrame) – table with descriptive statistics for a series of sequences (‘chunks’).
hard_counts (np.ndarray) – cluster assignments for the corresponding ‘chunk_stats’ table.
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings
n_folds (int) – number of folds for cross validation. If None (default) leave-one-experiment-out CV is used.
verbose (int) – verbosity level. Must be an integer between 0 (nothing printed) and 3 (all is printed).

Returns:

trained supervised model on the full dataset, mapping chunk stats to cluster assignments. Useful to run the SHAP explainability pipeline. cluster_gbm_performance (dict): cross-validated dictionary containing trained estimators and performance metrics. groups (list): cross-validation indices. Data from the same animal are never shared between train and test sets.

Return type:

full_cluster_clf (imblearn.pipeline.Pipeline)

deepof.utils module

Functions and general utilities for the deepof package.

class deepof.utils.KeyErrorMessage: Bases: str

deepof.utils.rts_smoother_numba(measurements, F, H, Q, R)

Implements the Rauch-Tung-Striebel (RTS) smoother for state estimation.

This function performs both forward and backward passes to estimate the optimal state sequence given a set of noisy measurements. It first applies the Kalman filter in a forward pass and then refines the estimates using the RTS smoother in a backward pass.

Parameters:

measurements (np.ndarray) – Array of measurements, shape (n_timesteps, n_dim_measurement).
F (np.ndarray) – State transition matrix, shape (n_dim_state, n_dim_state).
H (np.ndarray) – Observation matrix, shape (n_dim_measurement, n_dim_state).
Q (np.ndarray) – Process noise covariance matrix, shape (n_dim_state, n_dim_state).
R (np.ndarray) – Measurement noise covariance matrix, shape (n_dim_measurement, n_dim_measurement).

Returns:

Smoothed state estimates, shape (n_timesteps, n_dim_state).

Return type:

smoothed_states (np.ndarray)

deepof.utils.enforce_skeleton_constraints_numba(data, skeleton_constraints, original_pos, tolerance=0.1, correction_factor=0.5)

Adjusts the positions of body parts in each frame to ensure that the distances between connected parts adhere to predefined skeleton constraints within a specified tolerance.

Parameters:

data (np.ndarray) – Motion capture data, shape (n_frames, n_body_parts, 2).
skeleton_constraints (list) – List of tuples (part1, part2, dist) defining the constraints between body parts and their expected distances.
original_pos (np.ndarray) – Boolean array indicating original (non-interpolated) positions, shape (n_frames, n_body_parts, 2).
tolerance (float) – Allowable deviation from the constraint distance (default: 0.1).
correction_factor (float) – Factor to control the strength of position adjustments (default: 0.5).

Returns:

Adjusted motion capture data with enforced skeleton constraints.

Return type:

np.ndarray

class deepof.utils.MouseTrackingImputer(n_iterations=10, connectivity=None, full_imputation=False)

Bases: object

A class for imputing and processing mouse tracking data.

This class provides methods for interpolating missing data points, enforcing skeleton constraints, and smoothing trajectories in mouse tracking experiments.

n_iterations

Number of iterations for imputation (default: 10).

Type:: int

connectivity

Connectivity information for body parts.

Type:: object

full_imputation

Whether to perform full imputation or only a partial linear imputation (default: False).

Type:: bool

body_part_indices

Mapping of body part names to indices.

Type:: OrderedDict

skeleton_constraints

List of skeleton constraints.

Type:: list

mouse_body_estimation_samples

Number of sample frames with non-nan data to estimate valid mouse shapes (default: 100).

Type:: int

lin_interp_limit

Limit for linear interpolation (default: 3).

Type:: int

__init__(n_iterations=10, connectivity=None, full_imputation=False)

fit_transform(**kwargs)

deepof.utils.connect_mouse(animal_ids=None, exclude_bodyparts: list | None = None, graph_preset: str = 'deepof_14') → Graph

Create a nx.Graph object with the connectivity of the bodyparts in the DLC topview model for a single mouse.

Used later for angle computing, among others.

Parameters:

animal_ids (str) – if more than one animal is tagged, specify the animal identyfier as a string.
exclude_bodyparts (list) – Remove the specified nodes from the graph.
graph_preset (str) – Connectivity preset to use. Currently supported: “deepof_14”, “deepof_11” and “deepof_8”.

Returns:

connectivity (nx.Graph)

deepof.utils.edges_to_weighted_adj(adj: ndarray, edges: ndarray)

Convert an edge feature matrix to a weighted adjacency matrix.

Parameters:

adj (-) – binary adjacency matrix of the current graph.
edges (-) – edge feature matrix. Last two axes should be of shape nodes x features.

deepof.utils.enumerate_all_bridges(G: <module 'networkx.classes.graph' from '/home/docs/checkouts/readthedocs.org/user_builds/deepof/envs/latest/lib/python3.10/site-packages/networkx/classes/graph.py'>) → list

Enumerate all 3-node connected sequences in the given graph.

Parameters:: G (-) – Animal connectivity graph.
Returns:: List with all 3-node connected sequences in the provided graph.
Return type:: bridges (list)

deepof.utils.str2bool(v: str) → bool

Return the passed string as a boolean.

Parameters:: v (str) – String to transform to boolean value.
Returns:: If conversion is not possible, it raises an error
Return type:: bool

deepof.utils.compute_animal_presence_mask(quality: deepof_table_dict, threshold: float = 0.5) → deepof_table_dict

Compute a mask of the animal presence in the video.

Parameters:

quality (table_dict) – Dictionary with the quality of the tracking for each body part and animal.
threshold (float) – Threshold for the quality of the tracking. If the quality is below this threshold, the animal is considered to be absent.

Returns:

Dictionary with the animal presence mask for each bodypart and animal.

Return type:

animal_presence_mask (table_dict)

deepof.utils.iterative_imputation(project: deepof_project, tab_dict: dict, lik_dict: dict, full_imputation: bool = False)

Perform iterative imputation on occluded body parts. Run per animal and experiment.

Parameters:

project (project) – Project object.
tab_dict (dict) – Dictionary with the coordinates of the body parts.
lik_dict (dict) – Dictionary with the likelihood of the tracking for each body part and animal.
full_imputation (bool) – Determines if only small gaps get linearily imputed (False) or additionally IterativeImputer and a few other steps are executed to close all gaps (True)

Returns:

Dictionary with the coordinates of the body parts after imputation.

Return type:

tab_dict (dict)

deepof.utils.set_missing_animals(coordinates: deepof_project, tab_dict: dict, lik_dict: dict, animal_ids: list | None = None)

Set the coordinates of the missing animals to NaN.

Parameters:

coordinates (project) – Project object.
tab_dict (dict) – Dictionary with the coordinates of the body parts.
lik_dict (dict) – Dictionary with the likelihood of the tracking for each body part and animal.
animal_ids (list) – List with the animal ids to remove. If None, all the animals with missing data are processed.

Returns:

Dictionary with the coordinates of the body parts after removing missing animals.

Return type:

tab_dict (dict)

deepof.utils.time_to_seconds(time_string: str) → float

Compute seconds as float based on a time string.

Parameters:: time_string (str) – time string as input (format HH:MM:SS or HH:MM:SS.SSS…).
Returns:: time in seconds
Return type:: seconds (float)

deepof.utils.seconds_to_time(seconds: float, cut_milliseconds: bool = True) → str

Compute a time string based on seconds as float.

Parameters:

seconds (float) – time in seconds
cut_milliseconds (bool) – decides if milliseconds should be part of the output, defaults to True

Returns:

time string (format HH:MM:SS or HH:MM:SS.SSS…)

Return type:

time_string (str)

deepof.utils.load_exp_conditions(filepath: str)

deepof.utils.load_start_markers(filepath, frame_rate): Load start markers analogous to experimental conditions and do some checks

deepof.utils.bp2polar(tab: DataFrame) → DataFrame

Return the DataFrame in polar coordinates.

Parameters:: tab (pandas.DataFrame) – Table with cartesian coordinates.
Returns:: Equivalent to input, but with values in polar coordinates.
Return type:: polar (pandas.DataFrame)

deepof.utils.tab2polar(cartesian_df: DataFrame) → DataFrame

Return a pandas.DataFrame in which all the coordinates are polar.

Parameters:: cartesian_df (pandas.DataFrame) – DataFrame containing tables with cartesian coordinates.
Returns:: Equivalent to input, but with values in polar coordinates.
Return type:: result (pandas.DataFrame)

deepof.utils.compute_dist(pair_array: array) → DataFrame

Return a pandas.DataFrame with the scaled distances between a pair of body parts.

Parameters:: pair_array (numpy.array) – np.array of shape N * 4 containing X, y positions over time for a given pair of body parts.
Returns:: pandas.DataFrame with the absolute distances between a pair of body parts.
Return type:: result (pd.DataFrame)

deepof.utils.bpart_distance(dataframe: DataFrame, bit_precision: BitPrecision = BitPrecision.f64) → DataFrame

Return a pandas.DataFrame with the scaled distances between all pairs of body parts.

Parameters:: dataframe (pandas.DataFrame) – pd.DataFrame of shape N*(2*bp) containing X,y positions over time for a given set of bp body parts.
Returns:: pandas.DataFrame with the absolute distances between all pairs of body parts.
Return type:: result (pd.DataFrame)

deepof.utils.angle(bpart_array: array, bit_precision: BitPrecision = BitPrecision.f64) → array

Return a numpy.ndarray with the angles between the provided instances.

Parameters:: bpart_array (numpy.array) – 2D positions over time for a bodypart.
Returns:: 1D angles between the three-point-instances.
Return type:: ang (np.array)

deepof.utils.signed_angle(bpart_array: array) → array

Return a numpy.ndarray with the signed angles between the provided instances.

Parameters:: bpart_array (numpy.array) – 2D positions over time for a bodypart.
Returns:: 1D angles between the three-point-instances.
Return type:: ang (np.array)

deepof.utils.compute_areas(polygon_xy_stack: array) → array

Compute polygon areas for the provided stack of sets of data point-xy coordinates.

Parameters:: polygon_xy_stack – 3D numpy array [NPolygons (i.e. NFrames), Npoints, NDim (x,y)]
Returns:: areas for the provided xy coordinates.
Return type:: areas (np.ndarray)

deepof.utils.compute_areas_numba(polygon_xy_stack: array) → array

Compute polygon areas for the provided stack of sets of data point-xy coordinates.

Parameters:: polygon_xy_stack (np.ndarray) – 3D numpy array [NPolygons (i.e. NFrames), Npoints, NDim (x,y)]
Returns:: areas for the provided xy coordinates.
Return type:: areas (np.ndarray)

deepof.utils.polygon_area_numba(vertices: ndarray) → float

Calculate the area of a single polygon given its vertices.

Parameters:: vertices (np.ndarray) – Array of shape [Npoints, 2] containing the (x, y) coordinates of the polygon’s vertices.
Returns:: Area of the polygon.
Return type:: float

deepof.utils.extend_behaviors_numba(behaviors: ndarray, delta_T: float = 2.0, frame_rate: float = 1) → ndarray

Takes a booelan array of behavior detections and extends each behavior detection by delta_T.

Parameters:

behaviors (np.ndarray) – Boolean array of shape [N_behaviors, N_frames] containing the detection results (True / False) of each behavior for each frame.
delta_T – Time by which each behavior should be expanded
frame_rate (float) – Frame rate of the corresponding project

Returns:

Boolean array of shape [N_behaviors, N_frames] containing the detection results (True / False) of each behavior for each frame after extension.

Return type:

extended_behaviors (np.ndarray)

deepof.utils.count_transitions(tab_dict: deepof_table_dict, exp_conditions: dict, bin_info: dict | None = None, animals_in_roi: list | None = None, delta_T: float = 0.5, frame_rate: float = 1, silence_diagonal: bool = False, aggregate: str = True, normalize: str = True, diagonal_behavior_counting: str = 'Transitions', custom_continuous_behavior_names: list = [])

Count transitions between successive behaviors for all experiments in tab_dict.

Parameters:

tab_dict (table_dict) – Dictionary with behavior data (supervised or unsupervised soft_counts)
exp_conditions (dict) – Dictionary containg the experiment conditions for each experiment.
bin_info (dict) – dictionary containing indices to plot for all experiments
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
delta_T – Time after the offset of one behavior during which the onset of the next behavior counts as a transition
frame_rate (float) – Frame rate of the corresponding project
silence_diagonal (bool) – If True, diagonals are set to zero.
aggregate (bool) – If True, sums matrices per experimental condition; else per experiment.
normalize (bool) – Row-normalizes transition probabilities if True. Default=True.
diagonal_behavior_counting (str) – How to count diagonals (self-transitions). Options: - “Frames”: Total frames where behavior is active (after extension) - “Time”: Total time where behavior is active - “Events”: number of instances of the behavior occuring - “Transitions”: number of frame-wise internal behavior transitions e.g. A behavior of 4 frames in length would have 3 transitions.
custom_continuous_behavior_names (list) – list of potentially added names of custom continuous behaviors (should get sorted out)

Returns:

Dictionary of transition matrices. Keys:

If aggregate=True: Condition labels (e.g., {‘control’: array(…)})
If aggregate=False: Experiment IDs (e.g., {‘exp1’: array(…)})

columns (list): Behavior names (columns after dropping non-binary features). combined_columns (list): All possible behavior transition pairs (e.g., [‘BehaviorA-x-BehaviorB’, …]).

Return type:

transitions_dict (dict)

deepof.utils.count_events(binary_behavior: ndarray, counting_mode: str = 'Events', frame_rate: int = 1) → int

Counts the number of continuous blocks of 1s in a binary behavior vector in different ways

Parameters:

binary_behavior (numpy.ndarray) – Binary 1D Array containing behavior detections.
counting_mode (str) – Counting mode. Options are:
"Frames" (-) – Counts total number of frames in all events
"Time" (-) – Counts total time duration of all events (requires frame_rare input)
"Events" (-) – Counts number of continuous blocks of 1s
"Transitions" (-) – Counts number of frame-to-frame transitions within the events e.g. an event of 10 frames in length would have 9 transitions.
frame_rate (float) – Frame rate of the recording.

Returns:

counted events

Return type:

num_events (float)

deepof.utils.rotate(p: array, angles: array, origin: array = array([0, 0])) → array

Return a 2D numpy.ndarray with the initial values rotated by angles radians.

Parameters:

p (numpy.ndarray) – 2D Array containing positions of bodyparts over time.
angles (numpy.ndarray) – Set of angles (in radians) to rotate p with.
origin (numpy.ndarray) – Rotation axis (zero vector by default).

Returns:

rotated positions over time

Return type:

rotated (numpy.ndarray)

deepof.utils.rotate_all_numba(data: array, angles: array) → array

Rotates Return a 2D numpy.ndarray with the initial values rotated by angles radians.

Parameters:

p (numpy.ndarray) – 2D Array containing positions of bodyparts over time.
angles (numpy.ndarray) – Set of angles (in radians) to rotate p with.

Returns:

rotated positions over time

Return type:

rotated (numpy.ndarray)

deepof.utils.rotate_numba(p: array, angles: array, origin: array = array([0, 0])) → array

Return a 2D numpy.ndarray with the initial values rotated by angles radians.

Parameters:

p (numpy.ndarray) – 2D Array containing positions of bodyparts over time.
angles (numpy.ndarray) – Set of angles (in radians) to rotate p with.
origin (numpy.ndarray) – Rotation axis (zero vector by default).

Returns:

rotated positions over time

Return type:

rotated (numpy.ndarray)

deepof.utils.point_in_polygon(points: array, polygon: Polygon) → array

Check if a set of points is inside a polygon.

Parameters:

points (np.ndarray) – An array of shape (M, 2) containing the coordinates of the points.
polygon (shapely.geometry.polygon.Polygon) – Shapely polygon.

Returns:

A boolean array of shape (M,) indicating whether each point is inside the polygon.

Return type:

np.ndarray

deepof.utils.point_in_polygon_numba(points: array, polygon: array) → array

This function was generated by Perplexity.ai Check if a set of points is inside a polygon.

Parameters:

points (np.ndarray) – An array of shape (M, 2) containing the coordinates of the points.
polygon (np.ndarray) – An array of shape (N, 2) containing the coordinates of the polygon vertices.

Returns:

A boolean array of shape (M,) indicating whether each point is inside the polygon.

Return type:

np.ndarray

deepof.utils.get_point_polygon_distance(points: ndarray, polygon: Polygon) → ndarray: Calculates array of distances between 2D points and a polygon (roi)

deepof.utils.get_point_polygon_distance_numba(points, poly_xy)

deepof.utils.in_field_of_view(mouse_pts: ndarray, fov_angle_deg: float, roi: Polygon, plot: bool = True, eps: float = 1e-10) → ndarray

mouse_pts: (N, 3, 2) or (3, 2), order [left_ear, nose, right_ear] Returns float array of shape (N,):

1.0 -> ROI intersects FOV 0.0 -> ROI does not intersect FOV np.nan -> cannot be calculated (invalid/degenerate geometry or non-finite points)

Apex of FOV triangle is midpoint between ears.

deepof.utils.in_field_of_view_numba(mouse_pts, fov_angle_deg, roi_poly, eps=1e-10)

Numba version of in_field_of_view (no plotting, no shapely).

mouse_pts: (N,3,2) float64 roi_poly: (M,2) float64 (not closed) returns: (N,) float64 in {1.0, 0.0, nan}

deepof.utils.mouse_in_roi(tab, aid, in_roi_criterion, roi_polygon, invert_roi, run_numba=False)

Checks if a given animal for a given table is in a given roi by given criterion.

Parameters:

tab (dataTable) – Datatable containing mouse tracking data.
aid (str) – ainimal id of the mouse to check
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
roi_polygon (np.ndarray) – 2D numpy array containing the coordinats of the ROI
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)

Returns:

A boolean array indicating whether the mouse is inside the ROI.

Return type:

mouse_in_polygon (np.ndarray)

deepof.utils.get_supervised_behaviors_in_roi(cur_supervised: DataFrame, local_bin_info: dict, animal_ids: str | list, roi_mode: str = 'mousewise')

Filter supervised behaviors based on rois given by animal_ids.

Parameters:

cur_supervised (pd.DataFrame) – data frame with supervised behaviors.
local_bin_info (dict) – bin_info dictionary for one experiment, containing field “time” with array of included frames and fields “animal_id” with boolean arrays that denote which mace were within the selcted roi for these frames
animal_ids (Union[str, list]) – single or multiple animal ids
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)

Returns:

data frame with supervised behaviors with detections outside of the ROI set to NaN

Return type:

cur_supervised (pd.DataFrame)

deepof.utils.get_unsupervised_behaviors_in_roi(cur_unsupervised: array, local_bin_info: dict, animal_ids: str)

Filter unsupervised behaviors based on rois given by animal_ids.

Parameters:

cur_unsupervised (np.array) – 1D or 2D array with unsupervised behaviors (can be soft or hard counts).
local_bin_info (dict) – bin_info dictionary for one experiment, containing field “time” with array of included frames and fields “animal_id” with boolean arrays that denote which mace were within the selcted roi for these frames
animal_ids (Union[str, list]) – single or multiple animal ids

Returns:

1D or 2D array with unsupervised behaviors with detections outside of the ROI set to NaN (2D) or -1 (1D)

Return type:

cur_unsupervised (np.array)

deepof.utils.get_behavior_frames_in_roi(behavior: str, local_bin_info: dict, animal_ids: str | list)

Filter unsupervised behaviors based on rois given by animal_ids.

Parameters:

behavior (str) – Behavior for which frames in ROi get determined.
local_bin_info (dict) – bin_info dictionary for one experiment, containing field “time” with array of included frames and fields “animal_id” with boolean arrays that denote which mace were within the selcted roi for these frames
animal_ids (Union[str, list]) – single or multiple animal ids

Returns:

1D array containing all frames for which the animal is (animals are) within the ROI

Return type:

frames (np.array)

deepof.utils.align_trajectories(data: array, mode: str = 'all', run_numba: bool = False) → array

Remove rotational variance on the trajectories.

Returns a numpy.array with the positions rotated in a way that the center (0 vector), and body part in the first column of data are aligned with the y-axis.

Parameters:

data (numpy.ndarray) – 3D array containing positions of body parts over time, where shape is N (sliding window instances) * m (sliding window size) * l (features)
mode (string) – Specifies if all instances of each sliding window get aligned, or only the center
run_numba (bool) – Determines if numba versions of functions should be used (run faster but require initial compilation time on first run)

Returns:

2D aligned positions over time.

Return type:

aligned_trajs (np.ndarray)

deepof.utils.align_embeddings_at_key(embeddings, supervised_annotations, key, window_size=None, alignment_mode='center'): returns mid-sections of current embedding and supervised_annotations at key

deepof.utils.load_table(tab: str, table_path: str, table_format: str, rename_bodyparts_dict: dict | None = None, animal_ids: list | None = None)

Loads a table into a structured pandas data frame.

Supports inputs from both DeepLabCut and (S)LEAP.

Parameters:

tab (str) – Name of the file containing the tracks.
table_path (string) – Full path to the file containing the tracks.
table_format (str) – type of the files to load, coming from either DeepLabCut (CSV and H5) and (S)LEAP (NPY).
rename_bodyparts_dict (dict) – dictionary of bodypart names given in the table corresponding to deepOFs bodypart names.
animal_ids (list) – List with the animal ids in case of multiple tracked animals. Is expected to be None if there is only a single animal getting tracked.

Returns:

Data frame containing the loaded tracks. Likelihood for (S)LEAP files is imputed as 1.0 (tracked values) or 0.0 (missing values).

Return type:

loaded_tab (pd.DataFrame)

deepof.utils.rename_track_bps(loaded_tab: DataFrame, rename_bodyparts_dict: list, animal_ids: list)

Renames all body parts in the provided dataframe.

Parameters:

loaded_tab (pd.DataFrame) – Data frame containing the loaded tracks. Likelihood for (S)LEAP files is imputed as 1.0 (tracked values) or 0.0 (missing values).
rename_bodyparts_dict (dict) – dictionary of bodypart names given in the table corresponding to deepOFs bodypart names.
animal_ids (list) – list of IDs to use for the animals present in the provided tracking files.
bodypart_graph (str) – DeepOF bodypart graph that is going to be used

Returns:

Data frame with renamed body parts

Return type:

renamed_tab (pd.DataFrame)

Bases: object

Container for global scalers fitted across videos.

Notes

We keep the legacy dict format at the boundary (return value) to guarantee backward compatibility and make preprocess_new output identical to preprocess.

kind: str

speed_mode: str | None

dist_mode: str | None

coord_mode: str | None

log_distances: bool

speed: Any = None

dist: Any = None

dist_inner: Any = None

dist_intra: Any = None

coord: Any = None

to_legacy_dict() → Dict[str, Any]

is_effectively_empty() → bool

__init__(kind: str, speed_mode: str | None, dist_mode: str | None, coord_mode: str | None, log_distances: bool, speed: Any | None = None, dist: Any | None = None, dist_inner: Any | None = None, dist_intra: Any | None = None, coord: Any | None = None) → None

deepof.utils.infer_scalar_cols(df: DataFrame)

deepof.utils.infer_column_types(df): Identify coord, speed, distance, and angle columns from a pose table.

deepof.utils.scale_table(df: DataFrame, scale: str = 'standard', animal_ids=None, size_ref=('Nose', 'Tail_base'), inter_scale: str = 'mean', standardize: bool = True, dist_standardize: str = 'per_column', speed_standardize: str = 'per_column', coord_standardize: str = 'per_column', log_distances: bool = True) → DataFrame

deepof.utils.kleinberg(offsets: list, s: float = 2.0, gamma: float = 1.0, n=None, T=None, k=None)

Apply Kleinberg’s algorithm (described in ‘Bursty and Hierarchical Structure in Streams’).

The algorithm models activity bursts in a time series as an infinite hidden Markov model.

Taken from pybursts (https://github.com/romain-fontugne/pybursts/blob/master/pybursts/pybursts.py) and adapted for dependency compatibility reasons.

Parameters:

offsets (list) – a list of time offsets (numeric)
s (float) – the base of the exponential distribution that is used for modeling the event frequencies
gamma (float) – coefficient for the transition costs between states
n – used to adjust the fixed cost function (not dependent of the given offsets). Which is needed if you want to compare bursts for different inputs.
T – used to adjust the fixed cost function (not dependent of the given offsets). Which is needed if you want to compare bursts for different inputs.
k – maximum burst level

deepof.utils.kleinberg_core_numba(gaps: array, s: float64, gamma: float64, n: int, T: float64, k: int) → array

Computation intensive core part of Kleinberg’s algorithm (described in ‘Bursty and Hierarchical Structure in Streams’).

The algorithm models activity bursts in a time series as an infinite hidden Markov model.

Taken from pybursts (https://github.com/romain-fontugne/pybursts/blob/master/pybursts/pybursts.py) and rewritten for compatibility with numba.

Args:
gaps (np.array): an array of gap sizes between time offsets (numeric) s (float): the base of the exponential distribution that is used for modeling the event frequencies gamma (float): coefficient for the transition costs between states n: used to adjust the fixed cost function (not dependent of the given offsets). Which is needed if you want to compare bursts for different inputs. T: used to adjust the fixed cost function (not dependent of the given offsets). Which is needed if you want to compare bursts for different inputs. k: maximum burst level / number of hidden states

:+

deepof.utils.smooth_boolean_array(a: array, scale: int = 1, sigma=2.0, batch_size: int = 50000) → array

LEGACY FILTER FOR BEHAVIORAL ANALYSIS. REPLACED BY multi_step_paired_smoothing Return a boolean array in which isolated appearances of a feature are smoothed.

Args:
a (numpy.ndarray): Boolean instances. scale (int): Kleinberg scale parameter. Higher values result in stricter smoothing. batch_size (int): Batch size for input processing

Returns:
a (numpy.ndarray): Smoothed boolean instances.

deepof.utils.multi_step_paired_smoothing(behavior_in: array, not_behavior: array | None = None, exclude: array | None = None, min_length: int = 6, get_both: bool = False) → array

This filtering approach will first gradually merge together very close behavioral instances (how close is regulated by min_length), then filter out remaining short instances. In this way multiple instances close to each other are kept and united and isolated very short bursts are filtered out. It replaces the kleinberg filtering approach with a similar idea as kleinberg was too susceptible to merge events together that were relatively distant on the time scale.

Args:
behavior_in (numpy.ndarray): Boolean instances of detected raw behavior. not_behavior (numpy.ndarray): Boolean instances of raw behavior not occuring. exclude (numpy.ndarray): Additional boolean instances that will always be rated as “no behavior”. min_length (int): Determines the degree of smoothing. The smaller, the more short behavioral instances are kept and the sharper the behavioral edges remain. get_both (bool): If True, will also return the not_behavior instances that get smoothed along with the behavior instances.

Returns:
behavior (numpy.ndarray): Smoothened boolean instances. not_behavior (numpy.ndarray): Smoothened boolean not-behavior instances.

deepof.utils.rolling_window(a: ndarray, window_size: int, window_step: int) → ndarray

Return a 3D numpy.array with a sliding-window extra dimension.

Parameters:

a (np.ndarray) – N (instances) * m (features) shape
window_size (int) – Size of the window to apply
window_step (int) – Step of the window to apply

Returns:

N (sliding window instances) * l (sliding window size) * m (features)

Return type:

rolled_a (np.ndarray)

deepof.utils.extract_windows(to_window: deepof_table_dict, window_size: int, window_step: int, save_as_paths: bool = False, shuffle: bool = False, aggregate: str | None = None, windows_desc: str = 'Get windows') → ndarray

Apply the rupture method independently to each experiment, and concatenate into a single dataset at the end.

Returns a dataset and the rupture indices, adapted to be used in a concatenated version of the labels.

Parameters:

to_window (table_dict) – table_dict with all experiments.
window_size (int) – specifies the length of the sliding window.
window_step (int) – specifies the stride of the sliding window.
save_as_paths (bool) – save result as paths in dictionary instead of keeping it in RAM
shuffle (bool) – Whether to shuffle the data for each dataset. Defaults to False.
aggregate (str) – Aggregate Instead of extracting full windows. Extracts full windows if none (default), otherwise options are: “mean” : average windows to one value “mid” : take middle of windows as window value “wta” : winner takes all: whatever behavior or behavior combination is the most frequent is set as the window value “lta” : loser takes all: whatever behavior or behavior combination is the rarest is set as the window value
windows_desc (str) – Progress bar label

Returns:

Dictionary containing stacks of windowed data samples for each table. Shape of the stacks: [N_samples, window_size, N_features] output_shape (Tuple): shape of the output array (N_samples, window_size, N_features).

Return type:

to_window (dict)

deepof.utils.smooth_mult_trajectory(series: array, alpha: int = 0, w_length: int = 15) → ndarray

Return a smoothed a trajectory using a Savitzky-Golay 1D filter.

Parameters:

series (numpy.ndarray) – 1D trajectory array with N (instances)
alpha (int) – 0 <= alpha < w_length; indicates the difference between the degree of the polynomial and the window length for the Savitzky-Golay filter used for smoothing. Higher values produce a worse fit, hence more smoothing.
w_length (int) – Length of the sliding window to which the filter fit. Higher values yield a coarser fit, hence more smoothing.

Returns:

smoothed version of the input, with equal shape

Return type:

smoothed_series (np.ndarray)

deepof.utils.moving_average(time_series: Series, lag: int = 5) → Series

Fast implementation of a moving average function.

Parameters:

time_series (pd.Series) – Uni-variate time series to take the moving average of.
lag (int) – size of the convolution window used to compute the moving average.

Returns:

Uni-variate moving average over time_series.

Return type:

moving_avg (pd.Series)

deepof.utils.binary_moving_median_numba(time_series, lag): will applay a moving median like filter on a binary signal, i.e. if a window of size lag has more 1s than 0s set the frame to 1 for that window, set it to 0 otherwise. Will only work for windows of uneven length N i.e. returns the same for lag=N and lag=N+1

deepof.utils.mask_outliers(time_series: DataFrame, likelihood: DataFrame, likelihood_tolerance: float, lag: int, n_std: int, mode: str) → DataFrame

Return a mask over the bivariate trajectory of a body part, identifying as True all detected outliers.

An outlier can be marked with one of two criteria: 1) the likelihood reported by DLC is below likelihood_tolerance, and/or 2) the deviation from a moving average model is greater than n_std.

Parameters:

time_series (pd.DataFrame) – Bi-variate time series representing the x, y positions of a single body part
likelihood (pd.DataFrame) – Data frame with likelihood data per body part as extracted from deeplabcut
likelihood_tolerance (float) – Minimum tolerated likelihood, below which an outlier is called
lag (int) – Size of the convolution window used to compute the moving average
n_std (int) – Number of standard deviations over the moving average to be considered an outlier
mode (str) – If “and” (default) both x and y have to be marked in order to call an outlier. If “or”, one is enough.

Returns: mask (pd.DataFrame): Bi-variate mask over time_series. True indicates an outlier.

deepof.utils.full_outlier_mask(experiment: DataFrame, likelihood: DataFrame, likelihood_tolerance: float, exclude: str, lag: int, n_std: int, mode: str) → DataFrame

Iterate over all body parts of experiment, and outputs a dataframe where all x, y positions are replaced by a boolean mask, where True indicates an outlier.

Parameters:

experiment (pd.DataFrame) – Data frame with time series representing the x, y positions of every body part
likelihood (pd.DataFrame) – Data frame with likelihood data per body part as extracted from deeplabcut
likelihood_tolerance (float) – Minimum tolerated likelihood, below which an outlier is called
exclude (str) – Body part to exclude from the analysis (to concatenate with bpart alignment)
lag (int) – Size of the convolution window used to compute the moving average
n_std (int) – Number of standard deviations over the moving average to be considered an outlier
mode (str) – If “and” (default) both x and y have to be marked in order to call an outlier. If “or”, one is enough.

Returns:

Mask over all body parts in experiment. True indicates an outlier

Return type:

full_mask (pd.DataFrame)

deepof.utils.remove_outliers(experiment: DataFrame, likelihood: DataFrame, likelihood_tolerance: float, exclude: str = '', lag: int = 5, n_std: int = 3, mode: str = 'or') → DataFrame

Mark all outliers in experiment and replaces them using a uni-variate linear interpolation approach.

Note that this approach only works for equally spaced data (constant camera acquisition rates).

Parameters:

experiment (pd.DataFrame) – Data frame with time series representing the x, y positions of every body part.
likelihood (pd.DataFrame) – Data frame with likelihood data per body part as extracted from deeplabcut.
likelihood_tolerance (float) – Minimum tolerated likelihood, below which an outlier is called.
exclude (str) – Body part to exclude from the analysis (to concatenate with bpart alignment).
lag (int) – Size of the convolution window used to compute the moving average.
n_std (int) – Number of standard deviations over the moving average to be considered an outlier.
mode (str) – If “and” both x and y have to be marked in order to call an outlier. If “or” (default), one is enough.

Returns:

Interpolated version of experiment.

Return type:

interpolated_exp (pd.DataFrame)

deepof.utils.filter_animal_id_in_table(table: DataFrame, selected_id: str | None = None, table_type: str | None = None)

Filter a DataFrame to keep only those columns related to the selected id.

Leave labels untouched if present.

Parameters:

table (pd.DataFrame) – a dataFrame to be filtered
selected_id (str) – select a single animal on multi animal settings. Defaults to None (all animals are processed).
table_type (str) – type of the tableDict

Returns:

Filtered dataFrame, keeping only the selected animal.

Return type:

pd.DataFrame

deepof.utils.filter_columns(columns: list, selected_id: str, table_type: str | None = None) → list

Given a set of TableDict columns, returns those that correspond to a given animal, specified in selected_id.

Parameters:

columns (list) – List of columns to filter.
selected_id (str) – Animal ID to filter for.
table_type (str) – Type of the table (relevant if “supervised”)

Returns:

List of filtered columns.

Return type:

filtered_columns (list)

deepof.utils.rolling_speed(dframe: DatetimeIndex, frame_rate: int = 1, window: int = 3, rounds: int = 3, deriv: int = 1, shift: int = 2, typ: str = 'coords') → DataFrame

Return the average speed over n frames in millimeters per second.

Parameters:

dframe (pandas.DataFrame) – Position over time dataframe.
frame_rate (int) – Number of frames per second.
window (int) – Number of frames to average over.
rounds (int) – Float rounding decimals.
deriv (int) – Position derivative order; 1 for speed, 2 for acceleration, 3 for jerk, etc.
shift (int) – Window shift for rolling speed calculation.
typ (str) – Type of dataset. Intended for internal usage only.

Returns:

Data frame containing 2D speeds for each body part in the original data or their consequent derivatives.

Return type:

speeds (pd.DataFrame)

deepof.utils.get_behavior_mask_and_confidence(tab: DataFrame, behaviors: List[str], supervised_export: bool) → Tuple[DataFrame, DataFrame]

Generates a boolean mask and a confidence dataframe for given behaviors.

Parameters:

tab (Union[pd.DataFrame]) – Table with supervised or unsupervised behaviors, converted to a data frame.
behaviors (List(str)) – List of behavior names.
supervised_export (bool) – Does the given table contain supervised or unsupervised behaviors?

Returns:

Mask of confidence indices to keep.

Return type:

np.ndarray

deepof.utils.row_nanargmax(arr): argmax per row, ignoring NaNs. Returns NaN for all-NaN rows.

deepof.utils.filter_short_bouts(cluster_assignments: ndarray, cluster_confidence: ndarray, confidence_indices: ndarray, min_confidence: float = 0.0, min_bout_duration: int | None = None)

Filter out cluster assignment bouts shorter than min_bout_duration.

Parameters:

cluster_assignments (np.ndarray) – Array of cluster assignments.
cluster_confidence (np.ndarray) – Array of cluster confidence values.
confidence_indices (np.ndarray) – Array of confidence indices.
min_confidence (float) – Minimum confidence value.
min_bout_duration (int) – Minimum bout duration in frames.

Returns:

Mask of confidence indices to keep.

Return type:

np.ndarray

deepof.utils.filter_short_true_segments(array: ndarray, min_length: int)

Filters out sahort “True” sections from boolean array “array”

Parameters:

array (np.ndarray) – Boolean array
min_length (int) – Minimum length of “true” sections within array.

Returns:

Mask of confidence indices to keep.

Return type:

np.ndarray

deepof.utils.filter_short_true_segments_numba(array: ndarray, min_length: int)

Filters out sahort “True” sections from boolean array “array”

Parameters:

array (np.ndarray) – Boolean array
min_length (int) – Minimum length of “true” sections within array.

Returns:

Mask of confidence indices to keep.

Return type:

np.ndarray

deepof.utils.gmm_compute(x: array, n_components: int, cv_type: str) → list

Fit a Gaussian Mixture Model to the provided data and returns evaluation metrics.

Parameters:

x (numpy.ndarray) – Data matrix to train the model
n_components (int) – Number of Gaussian components to use
cv_type (str) – Covariance matrix type to use. Must be one of “spherical”, “tied”, “diag”, “full”.

Returns:

model and associated BIC for downstream selection.

Return type:

gmm_eval (list)

deepof.utils.gmm_model_selection(x: DataFrame, n_components_range: range, part_size: int, n_runs: int = 100, n_cores: int = False, cv_types: Tuple = ('spherical', 'tied', 'diag', 'full')) → Tuple[List[list], List[ndarray], int | Any]

Run GMM clustering model selection on the specified X dataframe.

Outputs the bic distribution per model, a vector with the median BICs and an object with the overall best model.

Parameters:

x (pandas.DataFrame) – Data matrix to train the models
n_components_range (range) – Generator with numbers of components to evaluate
part_size (int) – Size of bootstrap samples for each model
n_runs (int) – Number of bootstraps for each model
n_cores (int) – Number of cores to use for computation
cv_types (tuple) – Covariance Matrices to try. All four available by default

Returns:

All recorded BIC values for all attempted parameter combinations (useful for plotting). - m_bic(list): All minimum BIC values recorded throughout the process (useful for plottinh). - best_bic_gmm (sklearn.GMM): Unfitted version of the best found model.

Return type:

bic (list)

deepof.utils.compute_compactness(Z_pos: ndarray, Z_all: ndarray, eps: float = 1e-12) → Dict[str, float]

Compute compactness of positive-class embeddings relative to all embeddings.

Uses the trace of the covariance matrix as a spread measure. Lower values indicate tighter clustering of positive samples.

Parameters:

Z_pos – Positive-class embeddings, shape (N_pos, D).
Z_all – All embeddings, shape (N_all, D).
eps – Guard against division by zero.

Returns:

trace_cov_pos – absolute trace covariance of positives;: trace_cov_pos_norm_global – ratio of positive to global trace.

Return type:

dict

deepof.utils.compute_separability_logreg(X: ndarray, y: ndarray, n_splits: int = 5, seed: int = 0, C: float = 1.0, max_train: int = 100000) → Dict[str, float]

Compute class separability via logistic-regression average precision (AP).

Performs stratified k-fold cross-validation with a balanced logistic regression classifier and returns mean ± std of the AP score.

Parameters:

X – Feature matrix, shape (N, D).
y – Binary labels in {0, 1}, shape (N,).
n_splits – Number of CV folds.
seed – Random seed for reproducibility.
C – Inverse regularisation strength.
max_train – Maximum samples used (balanced subsample).

Returns:

ap_mean, ap_std, n_used. Values are NaN when only: one class is present.

Return type:

dict

deepof.utils.compute_knn_agreement(X: ndarray, y: ndarray, k: int = 25, seed: int = 0, max_points: int = 50000, max_pos_queries: int = 10000, metric: str = 'cosine') → Dict[str, float]

Compute kNN label agreement for positive-class samples.

For each positive sample, reports the fraction of its k nearest neighbours that are also positive. Higher values indicate better clustering.

Parameters:

X – Feature matrix, shape (N, D).
y – Binary labels in {0, 1}, shape (N,).
k – Number of nearest neighbours.
seed – Random seed for subsampling.
max_points – Maximum reference-set size.
max_pos_queries – Maximum positive query points.
metric – Distance metric for kNN.

Returns:

k, pos_knn_agree_mean, pos_knn_agree_std,: n_ref, n_pos_queries.

Return type:

dict

deepof.utils.cluster_transition_matrix(cluster_sequence: array, nclusts: int, autocorrelation: bool = True, return_graph: bool = False) → Tuple[Graph | Any, ndarray]

Compute the transition matrix between clusters and the autocorrelation in the sequence.

Parameters:

cluster_sequence (numpy.array) – Sequence of cluster assignments.
nclusts (int) – Number of clusters in the sequence.
autocorrelation (bool) – Whether to compute the autocorrelation of the sequence.
return_graph (bool) – Whether to return the transition matrix as an networkx.DiGraph object.

Returns:

Transition matrix as numpy.ndarray or networkx.DiGraph. autocorr (numpy.array): If autocorrelation is True, returns a numpy.ndarray with all autocorrelation values on cluster assignment.

Return type:

trans_normed (numpy.ndarray / networkx.Graph)

deepof.utils.get_total_Frames(video_paths: dict) → int

Get the number of all frames in all videos listed in the input dictionary

Parameters:: video_paths (dict) – Paths to all videos in a dicitonary
Returns:: Total number of all video frames
Return type:: total_frames (int)

deepof.utils.validate_parameter(param_name: str, param_value: Any, valid_options: List[Any], is_list: bool = False, custom_error_if_empty: str | None = None, only_one_of_many: bool | None = True, can_be_dict: bool | None = False)

A generic helper to validate a single parameter against a list of valid options.

Parameters:

param_name (str) – The name of the parameter being checked (for error messages).
param_value (Any) – The value of the parameter provided by the user.
valid_options (List[Any]) – The list of allowed values.
is_list (bool) – If True, checks if param_value is a subset of valid_options. Otherwise, checks if it is a member of valid_options.
custom_error_if_empty (Optional[str]) – A specific error to raise if the parameter is provided but the list of valid options is empty.
only_one_of_many (Optional[bool]) – If only one of the valid options is allowed: True If a subset of the valid options is allowed: False
can_be_dict (Optional[bool]) – Parameter can also be given as a dict (e.g. allowed for experiment_id)

deepof.visuals module

General plotting functions for the deepof package.

deepof.visuals.plot_heatmaps(coordinates: deepof_coordinates, bodyparts: list, center: str = 'arena', align: str | None = None, exp_condition: str | None = None, condition_value: str | None = None, experiment_id: int = 'average', bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: list | None = None, display_rois: bool = True, in_roi_criterion: str = 'Center', invert_roi: bool = False, display_arena: bool = True, xlim: float | None = None, ylim: float | None = None, extrapolate_heatmap: bool = True, save: bool = False, dpi: int = 100, ax: Any | None = None, show: bool = True, **kwargs) → figure

Plot heatmaps of the specified body parts (bodyparts) of the specified animal (i).

Parameters:

coordinates (coordinates) – deepof Coordinates object.
bodyparts (list) – list of body parts to plot.
center (str) – Name of the body part to which the positions will be centered. If false, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch.
align (str) – Selects the body part to which later processes will align the frames with (see preprocess in table_dict documentation).
exp_condition (str) – Experimental condition to plot base filters on.
condition_value (str) – Experimental condition value to plot. If available, it filters the experiments to keep only those whose condition value matches the given string in the provided exp_condition.
experiment_id (str) – Name of the experiment to display. When given as “average” positiosn of all animals are averaged.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored. Note: providing precomputed bins with gaps will result in an incorrect time vector depiction.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
display_rois (bool) – Display the active ROI, if a ROI was selected. Defaults to True.
display_arena (bool) – whether to plot a dashed line with an overlying arena perimeter. Defaults to True.
xlim (float) – x-axis limits.
ylim (float) – y-axis limits.
extrapolate_heatmap (bool) – show full heatmap including extrapolated parts (default=True)
save (str) – if provided, the figure is saved to the specified path.
dpi (int) – resolution of the figure.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, a new figure will be created.
show (bool) – whether to show the created figure. If False, returns al axes.
kwargs – additional arguments to pass to the seaborn kdeplot function.

Returns:

figure with the specified characteristics

Return type:

heatmaps (plt.figure)

deepof.visuals.plot_gantt(coordinates: deepof_project, instance_id: str, supervised_annotations: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_index: int | str | None = None, bin_size: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max=20000, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', additional_checkpoints: DataFrame | None = None, signal_overlay: Series | None = None, instances_to_plot: list | None = None, ax: Any | None = None, save: bool = False)

Return a scatter plot of the passed projection. Allows for temporal and quality filtering, animal aggregation, and changepoint detection size visualization.

Parameters:

coordinates (project) – deepOF project where the data is stored.
instance_id (str) – Name of the instance to display (can either be an experiment or a behavior).
supervised_annotations (table_dict) – table dict with supervised annotations per video. new figure will be created.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
bin_size (Union[int,str]) – bin size for time filtering.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored. Note: providing precomputed bins with gaps will result in an incorrect time vector depiction.
start_marker (str) – name of start marker to be used for binning. Defaults to None, which leads to all signals starting at the actual 0.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI)
additional_checkpoints (pd.DataFrame) – table with additional checkpoints to plot.
signal_overlay (pd.Series) – overlays a continuous signal with all selected behaviors. None by default.
instances_to_plot (list) – list of either behaviors or experiments to plot. If instance_id is an experiment this needs to be a list of behaviors and vice versa. If None, all options are plotted.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.

deepof.visuals.gantt_plotter(coordinates: deepof_project, gantt_matrix: ndarray, plot_type: str, instance_id: str, n_available_instances: int, instances_to_plot: list, colors: list, behavior_mode: bool, bin_info: dict, bin_indices: ndarray, additional_checkpoints: DataFrame | None = None, signal_overlay: Series | None = None, ax: Any | None = None, save: bool = False)

Return a scatter plot of the passed projection. Allows for temporal and quality filtering, animal aggregation, and changepoint detection size visualization.

Parameters:

coordinates (project) – deepOF project where the data is stored.
gantt_matrix (np.ndarray) – 2D integer matrix denoting time sections with present or absent behavior
plot_type (str) – type of plot, either “supervised” or “unsupervised”
instance_id (str) – Name of the experiment or behavior to display.
n_available_instances (int) – number of all possibly available instances (may be behaviors or experiments)
instances_to_plot (list) – selected instances for plotting as a list (may be behaviors or experiments)
colors (list) – list of color hexcodes for plotting
bin_info (dict) – A dictionary containing start and end positions or indices of all sections for given embeddings and ROIs
bin_indices (np.ndarray) – indices to plot
additional_checkpoints (pd.DataFrame) – table with additional checkpoints to plot.
signal_overlay (pd.Series) – overlays a continuous signal with all selected behaviors. None by default.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.

deepof.visuals.plot_enrichment(coordinates: deepof_coordinates, embeddings: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, supervised_annotations: deepof_table_dict | None = None, bin_index: int | str | None = None, bin_size: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max: int = 100000, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', invert_roi: bool = False, polar_depiction: bool = False, plot_speed: bool = False, add_stats: str = 'Mann-Whitney', exp_condition: str | None = None, exp_condition_order: list | None = None, normalize: bool = False, verbose: bool = False, unit_time: str = 's', unit_distance: str = 'm', ax: Any | None = None, save: bool = False)

Violin plots per cluster per condition.

Parameters:

coordinates (coordinates) – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
supervised_annotations (table_dict) – table dict with supervised annotations per animal experiment across time.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
bin_size (Union[int,str]) – bin size for time filtering.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
polar_depiction (bool) – if True, display as polar plot.
plot_speed (bool) – if supervised annotations are provided, display only speed. Useful to visualize speed.
add_stats (str) – test to use. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
exp_condition_order (list) – Order in which to plot experimental conditions. If None (default), the order is determined by the order of the keys in the table dict.
normalize (bool) – whether to represent time fractions or actual time in seconds on the y axis.
verbose (bool) – if True, prints test results and p-value cutoffs. False by default.
unit_time (str) – Time unit (frames, seconds, minutes, hours) to display the result in the given unit
unit_distance (str) – Distance unit (millimeters, centimeters, meters) to display the result in the given unit
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.

deepof.visuals.return_transitions(coordinates: deepof_coordinates, supervised_annotations: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: list | None = None, in_roi_criterion: str = 'Center', invert_roi: bool = False, exp_condition: str | None = None, delta_T: float = 0.0, silence_diagonal: bool = False, diagonal_behavior_counting: str = 'Transitions', normalize: bool = True, visualization='networks'): Returns data of plot_transitions with same Input options

deepof.visuals.plot_transitions(coordinates: deepof_coordinates, supervised_annotations: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: list | None = None, in_roi_criterion: str = 'Center', invert_roi: bool = False, exp_condition: str | None = None, delta_T: float = 0.0, silence_diagonal: bool = False, diagonal_behavior_counting: str = 'Transitions', normalize: bool = True, visualization='networks', ax: list | None = None, save: bool = False, **kwargs)

Compute and plots transition matrices for all data or per condition. Plots can be heatmaps or networks.

Parameters:

coordinates (coordinates) – deepOF project where the data is stored.
supervised_annotations (table_dict) – table dict with supervised annotations.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
delta_T – Time after the offset of one behavior during which the onset of the next behavior counts as a transition
silence_diagonal (bool) – If True, diagonals are set to zero.
diagonal_behavior_counting (str) – How to count diagonals (self-transitions). Options: - “Frames”: Total frames where behavior is active (after extension) - “Time”: Total time where behavior is active - “Events”: number of instances of the behavior occuring - “Transitions”: number of frame-wise internal behavior transitions e.g. A behavior of 4 frames in length would have 3 transitions.
normalize (bool) – Row-normalizes transition probabilities if True. Default=True.
visualization (str) – visualization mode. Can be either ‘networks’, or ‘heatmaps’.
ax (list) – axes where to plot the current figure. If not provided, a new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.
kwargs – additional arguments to pass to the seaborn kdeplot function.

deepof.visuals.count_all_events(coordinates: deepof_coordinates, supervised_annotations: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max: int = 20000, roi_number: int | None = None, animals_in_roi: list | None = None, in_roi_criterion: str = 'Center', invert_roi: bool = False, counting_mode='Events')

Counts all events in supervised or soft_counts dataset and returns a data table.

Parameters:

coordinates (coordinates) – deepOF project where the data is stored.
supervised_annotations (table_dict) – table dict with supervised annotations.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
counting_mode (str) – How to count behaviors. Options: - “Frames”: Total frames where behavior is active (after extension) - “Time”: Total time where behavior is active - “Events”: number of instances of the behavior occuring - “Transitions”: number of frame-wise internal behavior transitions e.g. A behavior of 4 frames in length would have 3 transitions.

deepof.visuals.plot_stationary_entropy(coordinates: deepof_coordinates, embeddings: deepof_table_dict, soft_counts: deepof_table_dict, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max=20000, roi_number: int | None = None, animals_in_roi: list | None = None, in_roi_criterion: str = 'Center', invert_roi: bool = False, add_stats: str = 'Mann-Whitney', exp_condition: str | None = None, verbose: bool = False, ax: Any | None = None, save: bool = False)

Compute and plots transition stationary distribution entropy per condition.

Parameters:

coordinates (coordinates) – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
add_stats (str) – test to use. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
verbose (bool) – if True, prints test results and p-value cutoffs. False by default.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.

deepof.visuals.plot_normative_log_likelihood(embeddings: deepof_table_dict, exp_condition: str, embedding_dataset: DataFrame, normative_model: str, ax: Any, add_stats: str, verbose: bool)

Plot a bar chart with normative log likelihoods per experimental condition, and compute statistics.

Parameters:

embeddings (table_dict) – table dictionary containing supervised annotations or unsupervised embeddings per animal.
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
embedding_dataset (pd.DataFrame) – global animal embeddings, alongside their respective experimental conditions
normative_model (str) – Name of the cohort to use as controls. If provided, fits a Gaussian density to the control global animal embeddings, and reports the difference in likelihood across all instances of the provided experimental condition. Statistical parameters can be controlled via **kwargs (see full documentation for details).
ax (plt.AxesSubplot) – matplotlib axes where to render the plot
add_stats (str) – test to use. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
verbose (bool) – if True, prints test results and p-value cutoffs. False by default.

Returns:

embedding data frame with added normative scores per sample

Return type:

embedding_dataset (pd.DataFrame)

deepof.visuals.plot_embeddings(coordinates: deepof_coordinates, embeddings: deepof_table_dict | None = None, soft_counts: deepof_table_dict | None = None, supervised_annotations: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max=20000, roi_number: int | None = None, animals_in_roi: str | list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', invert_roi: bool = False, min_confidence: float = 0.0, normative_model: str | None = None, add_stats: str = 'Mann-Whitney', verbose: bool = False, exp_condition: str | None = None, aggregate_experiments: str | None = None, samples: int = 500, show_aggregated_density: bool = True, colour_by: str = 'cluster', umap_random_state: int = 0, ax: Any | None = None, save: bool = False)

Return a scatter plot of the passed projection. Allows for temporal and quality filtering, animal aggregation, and changepoint detection size visualization.

Parameters:

coordinates (coordinates) – deepOF project where the data is stored.
embeddings (table_dict) – table dict with neural embeddings per animal experiment across time.
soft_counts (table_dict) – table dict with soft cluster assignments per animal experiment across time.
supervised_annotations (table_dict) – table dict with supervised annotations per experiment.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
min_confidence (float) – minimum confidence in cluster assignments used for quality control filtering.
normative_model (str) – Name of the cohort to use as controls. If provided, fits a Gaussian density to the control global animal embeddings, and reports the difference in likelihood across all instances of the provided experimental condition. Statistical parameters can be controlled via **kwargs (see full documentation for details).
add_stats (str) – test to use. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
verbose (bool) – if True, prints test results and p-value cutoffs. False by default.
exp_condition (str) – Name of the experimental condition to use when plotting. If None (default) the first one available is used.
aggregate_experiments (str) – Whether to aggregate embeddings by experiment (by time on cluster, mean, or median) or not (default).
samples (int) – Number of samples to take from the time embeddings. None leads to plotting all time-points, which may hurt performance.
show_aggregated_density (bool) – if True, a density plot is added to the aggregated embeddings.
colour_by (str) – hue by which to colour the embeddings. Can be one of ‘cluster’ (default), ‘exp_condition’, ‘exp_id’ or, if supervised behaviors are given, also any supervised behavior.
umap_random_state (int) – Random state of Umap and sampled samples for Umap plot, default 0. If None, no fixed random state is selected (different U-map representation every time)
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
save (bool) – Saves a time-stamped vectorized version of the figure if True.

deepof.visuals.return_embedding_evaluation(coordinates: deepof_coordinates, embeddings: deepof_table_dict, supervised_annotations: deepof_table_dict, include_behaviors: list | None = None, window_size: int | None = None, alignment_mode: str = 'any', minimum_number_of_positives: int = 200, normalize: bool = True, random_state: int = 0) → DataFrame

Return embedding-quality metrics for all detected binary behaviors.

Computes compactness, logistic-regression separability (AP), and kNN label agreement for each binary behavior column found in supervised_annotations.

Parameters:

coordinates (coordinates) – deepOF project.
embeddings (table_dict) – Experiment ID → embedding array (T, D).
supervised_annotations (table_dict) – Experiment ID → annotation DataFrame.
include_behaviors (list) – list of behaviors to include in evaluation, if None, defaults to a subset of up behaviors
window_size (int) – window size used for the model. If None, size get’S estmated from difference in size of embeddings and supervised annotations.
alignment_mode (str) – How embedding windows and supervised detections should be aligned. Can be “center” (embedding window is labled as the behavior that occurs in its central frame) or “any” (embedding window is labled as the behavior(s) that occur in an of its frames).
minimum_number_of_positives (int) – minimum number of frame-wise occurences of a behavior to perform analysis.
normalize (bool) – Normalizes ap and knn based on positive rate.
random_state (int) – random state used for computations for reproducibility. Default is 0

Returns:

One row per behavior with metric columns.

Return type:

pd.DataFrame

deepof.visuals.plot_embedding_evaluation(coordinates: deepof_coordinates, embeddings: deepof_table_dict, supervised_annotations: deepof_table_dict, include_behaviors: list | None = None, window_size: int | None = None, alignment_mode: str = 'any', minimum_number_of_positives: int = 200, normalize: bool = True, random_state: int = 0) → Figure

Plot embedding-quality scores for all detected binary behaviors.

Creates a grid of bar plots (one per behavior) showing compactness, average precision, and kNN agreement, all normalised to [0, 1].

Parameters:

coordinates (coordinates) – deepOF project.
embeddings (table_dict) – Experiment ID → embedding array (T, D).
supervised_annotations (table_dict) – Experiment ID → annotation DataFrame.
include_behaviors (list) – list of behaviors to include in evaluation, if None, defaults to a subset of up behaviors
window_size (int) – window size used for the model. If None, size get’S estmated from difference in size of embeddings and supervised annotations.
alignment_mode (str) – How embedding windows and supervised detections should be aligned. Can be “center” (embedding window is labled as the behavior that occurs in its central frame) or “any” (embedding window is labled as the behavior(s) that occur in an of its frames).
minimum_number_of_positives (int) – minimum number of frame-wise occurences of a behavior to perform analysis.
normalize (bool) – Normalizes ap and knn based on positive rate.
random_state (int) – random state used for computations for reproducibility. Default is 0

Returns:

The figure containing the plots.

Return type:

matplotlib.figure.Figure

deepof.visuals.plot_training_metrics(log_summary: dict) → dict

Plot training curves from a log_summary dict.

Plots model dependent metrics

Parameters:: log_summary (dict) – log info dictionary output from model fitting.

Render a FuncAnimation object with embeddings and/or motion trajectories over time.

Parameters:

coordinates (coordinates) – deepof Coordinates object.
experiment_id (str) – Name of the experiment to display.
embeddings (table_dict) – UMAP or latent embedding for each experiment. If not None, a second animation shows the embedding, colored by cluster if available.
soft_counts (table_dict) – soft cluster assignments for all instances in data. If provided together with selected_cluster, only instances of the specified
bin_size (component are rendered. Defaults to None.) – bin size for time filtering.
bin_index (Union[int, str, None]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray, optional) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
roi_number (int, optional) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded).
animals_in_roi (str or list of str, optional) – IDs of animals that need to be inside the active ROI. All frames in which any of the given animals are not inside the ROI get excluded.
in_roi_criterion (str or list of str) – Criterion for in-roi check: a single bodypart, a list of bodyparts or “all” bodyparts of a mouse.
animal_id (str or list of str, optional) – ID list of animals to display. If None (default) it shows all animals.
center (str or bool) – Name of the body part to which the positions will be centered. If False, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch.
align (str, optional) – Body part to which later processes will align the frames.
sampling_rate (float, optional) – Sampling rate for the video. If None is given, the same one as in the video recordings will be used.
min_confidence (float) – Minimum confidence threshold to render a cluster assignment bout.
min_bout_duration (int, optional) – Minimum number of frames to render a cluster assignment bout.
selected_cluster (np.ndarray, optional) – Cluster to filter.
display_arena (bool) – Whether to plot a dashed line with an overlying arena perimeter.
legend (bool) – Whether to add a color-coded legend to multi-animal plots.
umap_random_state (int) – Random state of Umap, default 0. If None, no fixed random state is selected (different U-map representation every time)
save (bool or str, optional) – If not None, save the animation. If a string is provided, it is added as a suffix in the auto-generated file name.
dpi (int) – Dots per inch of the figure to create.

deepof.visuals.export_annotated_video(coordinates: deepof_coordinates, soft_counts: dict | None = None, supervised_annotations: deepof_table_dict | None = None, bin_size: int | str | None = None, bin_index: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, frame_limit_per_video: int | None = None, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', invert_roi: bool = False, behaviors: list | None = None, experiment_id: str | None = None, min_confidence: float = 0.75, min_bout_duration: int | None = None, display_time: bool = False, display_counter: bool = False, display_arena: bool = False, display_markers: bool = False, display_mouse_labels: bool = False, display_roi: int | None = None, exp_conditions: dict = {}, cluster_names: str | None = None)

Export annotated videos from both supervised and unsupervised pipelines.

Parameters:

coordinates (coordinates) – coordinates object for the current project. Used to get video paths.
soft_counts (dict) – dictionary with soft_counts per experiment.
supervised_annotations (table_dict) – table dict with supervised annotations per experiment.
bin_size (Union[int,str]) – bin size for time filtering.
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
frame_limit_per_video (int) – number of frames to render per video. If None, all frames are included for all videos.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
behaviors (list) – Behaviors or Clusters to that get exported. If none is given, all are exported for softcounts and only nose2nose is exported for supervised annotations. If multiple behaviors are given as a list, one video can get annotated with multiple different behaviors
experiment_id (str) – if provided, data coming from a particular experiment is used. If not, all experiments are exported.
min_confidence (float) – minimum confidence threshold for a frame to be considered part of a cluster.
min_bout_duration (int) – Minimum number of frames to render a cluster assignment bout.
display_time (bool) – Displays current time in top left corner of the video frame
display_counter (bool) – Displays event counter for each displayed event.
display_arena (bool) – Displays arena for each video.
display_markers (bool) – Displays mouse body parts on top of the mice.
display_mouse_labels (bool) – Displays identities of the mice
display_roi (int) – roi to display, default is None
exp_conditions (dict) – if provided, data coming from a particular condition is used. If not, all conditions are exported. If a dictionary with more than one entry is provided, the intersection of all conditions (i.e. male, stressed) is used.
cluster_names (dict) – dictionary with user-defined names for each cluster (useful to output interpretation).

deepof.visuals.plot_distance_between_conditions(coordinates: deepof_coordinates, embedding: dict, soft_counts: dict, exp_condition: str, embedding_aggregation_method: str = 'median', distance_metric: str = 'wasserstein', n_jobs: int = -1, save: bool = False, ax: Any | None = None)

Plot the distance between conditions across a growing time window.

Finds an optimal separation binning based on the distance between conditions, and plots it across all non-overlapping bins. Useful, for example, to measure habituation over time.

Parameters:

coordinates (coordinates) – coordinates object for the current project. Used to get video paths.
embedding (dict) – embedding object for the current project. Used to get video paths.
soft_counts (dict) – dictionary with soft_counts per experiment.
exp_condition (str) – experimental condition to use for the distance calculation.
embedding_aggregation_method (str) – method to use for aggregating the embedding. Options are ‘time_on_cluster’ and ‘mean’.
distance_metric (str) – distance metric to use for the distance calculation. Options are ‘wasserstein’ and ‘euclidean’.
n_jobs (int) – number of jobs to use for the distance calculation.
save (bool) – if True, saves the figure to the project directory.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.

deepof.visuals.return_mouse_roi_interaction(coordinates: deepof_coordinates, bodyparts: list | None = None, animal_id: str | None = None, N_time_bins: int = 24, custom_time_bins: List[List[int | str]] | None = None, start_marker: str | None = None, samples_max=20000, roi_number: int | None = None, hide_time_bins: list[bool] | None = None, experiment_ids: list | None = None, exp_condition: str | None = None, condition_values: str | List[str] | None = None, mode: str = 'distance', add_stats: str = 'Mann-Whitney', error_bars: str = 'sem', unit_distance: str = 'm', fov_angle_deg: int = 90, get_raw_data: bool = False)

Return binned statistics and effect sizes for mouse-ROI interaction over time.

Computes either the distance of selected bodyparts to a ROI/arena boundary or the fraction of time a ROI/arena falls within a mouse’s field of view, aggregated into time bins. When get_raw_data=True the raw per-frame interaction values are returned instead.

Parameters:

coordinates (coordinates) – deepOF project containing the stored data.
bodyparts (list) – List of bodyparts whose distance to the ROI/arena is measured. Used in “distance” mode.
animal_id (str) – ID of the animal to use. Used in “fov” mode to construct the required bodypart triplet (Left_ear, Nose, Right_ear).
N_time_bins (int) – Number of time bins for data separation. Defaults to 24.
custom_time_bins (List[List[Union[int, str]]]) – Custom time bins array consisting of pairs of start- and stop positions given as integers or time strings. Overrides N_time_bins if provided.
samples_max (int) – Maximum number of samples taken per bin to avoid excessive computation times. Defaults to 20000.
roi_number (int) – Number of the ROI to measure interaction with. If None, the arena boundary is used.
hide_time_bins (list[bool]) – List of booleans denoting which bins should be visible (False) or hidden (True). Defaults to displaying all time bins.
experiment_ids (list) – List of experiment IDs to include. If None, all experiments are used. Ignored when a valid exp_condition/condition_values combination is provided.
exp_condition (str) – Experimental condition to compare.
condition_values (str) – Condition values to compare. If a string is provided it is wrapped in a list.
mode (str) – Interaction measure to compute. Must be one of “distance” (bodypart-ROI distance) or “fov” (field-of-view overlap). Defaults to “distance”.
add_stats (str) – Statistical test to use for pairwise comparisons. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
error_bars (str) – Type of error bars to compute (either standard deviation (“std”) or standard error (“sem”)). Defaults to standard error.
unit_distance (str) – Distance unit (m, cm, mm, …) used when mode is “distance”.
fov_angle_deg (int) – Angle of the field of view of the mouse, defaults to 90 deg.
get_raw_data (bool) – If True, returns the raw per-frame interaction DataFrame instead of binned statistics. Defaults to False.

Returns:

a DataFrame with raw per-frame interaction values per experiment. Otherwise: a tuple of (binned_effect_sizes_df, binned_group_df) containing binned statistics, means, error values and effect sizes.

Return type:

If get_raw_data=True

deepof.visuals.plot_mouse_roi_interaction(coordinates: deepof_coordinates, bodyparts: list | None = None, animal_id: str | None = None, N_time_bins: int = 24, custom_time_bins: List[List[int | str]] | None = None, start_marker: str | None = None, samples_max=20000, roi_number: int | None = None, hide_time_bins: list[bool] | None = None, experiment_ids: list | None = None, exp_condition: str | None = None, condition_values: str | List[str] | None = None, mode: str = 'distance', add_stats: str = 'Mann-Whitney', error_bars: str = 'sem', unit_distance: str = 'm', fov_angle_deg: int = 90, ax: Any | None = None, polar_depiction: bool = False, show_histogram: bool = True)

Plot mouse-ROI interaction over time as a polar plot or line chart with optional effect-size histogram.

Visualises either the distance of selected bodyparts to a ROI/arena boundary or the fraction of time a ROI/arena falls within a mouse’s field of view, aggregated into time bins. Supports statistical annotations and effect-size overlays when exactly two experimental conditions are compared.

Parameters:

coordinates (coordinates) – deepOF project containing the stored data.
bodyparts (list) – List of bodyparts whose distance to the ROI/arena is measured. Used in “distance” mode.
animal_id (str) – ID of the animal to use. Used in “fov” mode to construct the required bodypart triplet (Left_ear, Nose, Right_ear).
N_time_bins (int) – Number of time bins for data separation. Defaults to 24.
custom_time_bins (List[List[Union[int, str]]]) – Custom time bins array consisting of pairs of start- and stop positions given as integers or time strings. Overrides N_time_bins if provided.
samples_max (int) – Maximum number of samples taken per bin to avoid excessive computation times. Defaults to 20000.
roi_number (int) – Number of the ROI to measure interaction with. If None, the arena boundary is used.
hide_time_bins (list[bool]) – List of booleans denoting which bins should be visible (False) or hidden (True). Defaults to displaying all time bins.
experiment_ids (list) – List of experiment IDs to include. If None, all experiments are used. Ignored when a valid exp_condition/condition_values combination is provided.
exp_condition (str) – Experimental condition to compare.
condition_values (str) – Condition values to compare. If a string is provided it is wrapped in a list.
mode (str) – Interaction measure to compute. Must be one of “distance” (bodypart-ROI distance) or “fov” (field-of-view overlap). Defaults to “distance”.
add_stats (str) – Statistical test to use for pairwise comparisons. Mann-Whitney (non-parametric) by default. See statsannotations documentation for details.
error_bars (str) – Type of error bars to display (either standard deviation (“std”) or standard error (“sem”)). Defaults to standard error.
unit_distance (str) – Distance unit (m, cm, mm, …) used when mode is “distance”.
fov_angle_deg (int) – Angle of the field of view of the mouse, defaults to 90 deg.
ax (Any) – Matplotlib axis for plotting. If None, creates a new figure.
polar_depiction (bool) – If True, display as polar plot. Defaults to False.
show_histogram (bool) – If True, displays histogram with rough effect size estimations. Defaults to False.

Returns:

The Matplotlib axis containing the plot.

Return type:

ax

deepof.visuals.get_roi_data(coordinates: deepof_coordinates, table_dict: deepof_table_dict, roi_number: int, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', invert_roi: bool = False, bin_index: int | str | None = None, bin_size: int | str | None = None, precomputed_bins: ndarray | None = None, start_marker: str | None = None, samples_max: int = 100000, experiment_id: str | None = None)

get data in Rois.

Parameters:

coordinates (coordinates) – deepOF project where the data is stored.
table_dict (table_dict) – table dict with information for ROi extraction. Can be supervised or unsupervised data.
roi_number (int) – Number of the ROI that should be used for the plot (all behavior that occurs outside of the ROI gets excluded)
animals_in_roi (list) – List of ids of the animals that need to be inside of the active ROI. All frames in which any of the given animals are not inside of the ROI get excluded
roi_mode (str) – Determines how the rois should be applied to different behaviors. Options are “mousewise” (default, selected mice needs to be inside the ROI) and “behaviorwise” (only mice involved in a behavior need to be inside of the ROI, only for supervised behaviors)
in_roi_criterion (str) – Criterion for in roi check, can be a single bodypart, a list of bodyparts or “all” bodyparts of a mouse
bin_index (Union[int,str]) – index of the bin of size bin_size to select along the time dimension. Denotes exact start position in the time domain if given as string.
bin_size (Union[int,str]) – bin size for time filtering.
precomputed_bins (np.ndarray) – precomputed time bins. If provided, bin_size and bin_index are ignored.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.
experiment_id (str) – Name of the experiment id to extract. If None (default) a dictionary of all entries will be exported.

deepof.visuals.return_supervised_summary(coordinates: deepof_coordinates, supervised_annotations: deepof_table_dict, roi_number: int | None = None, animals_in_roi: list | None = None, roi_mode: str = 'mousewise', in_roi_criterion: str = 'Center', invert_roi: bool = False, N_time_bins: int = 10, start_marker: str | None = None, custom_time_bins: List[List[int | str]] | None = None, hide_time_bins: List[bool] | None = None, samples_max=20000, unit_time: str = 's', unit_distance: str = 'm', save_table=True)

Returns summary of supervised information

Args: N_time_bins (int): Number of time bins for data separation. Defaults to 24. custom_time_bins (List[List[Union[int,str]]]): Custom time bins array consisting of pairs of start- and stop positions given as integers or time strings. Overrides N_time_bins if provided. unit_time (str): Time unit (frames, seconds, minutes, hours) to display the result in the given unit unit_distance (str): Distance unit (millimeters, centimeters, meters) to display the result in the given unit

deepof.visuals_utils module

Plotting utility functions for the deepof package.

deepof.visuals_utils.hex_to_BGR(hex_color)

deepof.visuals_utils.BGR_to_hex(bgr_color)

deepof.visuals_utils.RGB_to_hex(rgb_color)

deepof.visuals_utils.RGB_to_BGR(rgb_color)

deepof.visuals_utils.BGR_to_RGB(bgr_color)

deepof.visuals_utils.get_behavior_colors(behaviors: list, animal_ids: list | DataFrame | None = None, custom_behaviors: List[DeepOF_behavior] | None = None)

Gets corresponding colors for all supervised behaviors or clusters within behaviors list.

Parameters:

behaviors (list) – List of strings containing behaviors
Union[list (animal_ids) – Either list of strings representing animal ids or supervised dataframe from which said list can be automatically extracted.
pd.DataFrame] – Either list of strings representing animal ids or supervised dataframe from which said list can be automatically extracted.

Returns:

A list of strings that contain hex color codes for each behavior. Will return None and display a warning for unknown behaviors.

Return type:

list

deepof.visuals_utils.generate_behavior_combinations(animal_ids, symmetric_behaviors: bool | List = True, asymmetric_behaviors: bool | List = True, single_behaviors: bool | List = True, continuous_behaviors: bool | List = True, custom_behaviors: List[DeepOF_behavior] | None = None): Return (result_list, color_dict) with full column names and their colors.

deepof.visuals_utils.calculate_average_arena(all_vertices: dict[List[Tuple[float, float]]], num_points: int = 10000) → array

Calculates the average arena based on a list of polynomial vertices lists representing arenas. Polynomial vertices can have different lengths and start at different positions

Parameters:

all_vertices (dict[List[Tuple[float, float]]]) – A dictionary of lists of 2D tuples representing the vertices of the arenas.
num_points (int) – number of points in the averaged arena.

Returns:

A 2D NumPy array containing the averaged arena.

Return type:

numpy.ndarray

deepof.visuals_utils.create_bin_pairs(L_array: int, N_time_bins: int)

Creates a List of bin_index and bin_size pairs when splitting a list in N_time_bins

Parameters:

L_array (int) – Length of the array to index.
N_time_bins (int) – number of time bins to create.

Returns:

A 2D list containing start and end positions of each bin.

Return type:

bin_pairs (list(tuple))

deepof.visuals_utils.build_valid_multibins(coordinates, N_time_bins, L_shortest, custom_time_bins=None, hide_time_bins=None, min_bins_required=4, start_marker=None)

deepof.visuals_utils.postprocess_df_bins(df: DataFrame, bin_lengths, hide_time_bins)

deepof.visuals_utils.cohend(array_a: array, array_b: array)

calculate Cohen’s d effect size. Does not assume equal population standard deviations, and can still be used for unequal sample sizes

Parameters:

array_a (np.array) – First array of values to compare.
array_b (np.array) – Second array of values to compare.

Return type:

Cohens d (int)

Cohen’s d can be used to calculate the standardized difference between two categories, e.g. difference between means The value of Cohen’s d varies from 0 to infinity. Sign indicates directionality? show both hypothesis test (likelihood of observing the data given an assumption (null hypothesis) w p-value) and effect size (quantify the size of the effect assuming that the effect is present) Cohen’s d measures the difference between the mean from two Gaussian-distributed variables. It is a standard score that summarizes the difference in terms of the number of standard deviations. Because the score is standardized, there is a table for the interpretation of the result, summarized as:

Small Effect Size: d=0.20 Medium Effect Size: d=0.50 Large Effect Size: d=0.80.

deepof.visuals_utils.cohend_effect_size(d: float)

categorizes Cohen’s d effect size.

Parameters:: d (float) – Cohens d
Returns:: Categorized effect size
Return type:: int

deepof.visuals_utils.calculate_FSTTC(preceding_behavior: Series, proximate_behavior: Series, frame_rate: float, delta_T: float = 2.0): Calculates the association measure FSTTC between two behaviors given as boolean series

deepof.visuals_utils.calculate_simple_association(preceding_behavior: ndarray, proximate_behavior: ndarray, frame_rate: float, min_T: float = 10.0): Calculates Yule’s coefficient Q between two behaviors given as boolean arrays

deepof.visuals_utils.contiguous_segments(mask: ndarray)

deepof.visuals_utils.scale_units(coordinates, key, data, unit: str, target_distance: str | None = None, target_time: str | None = None): Scale data from unit to requested target units and return (scaled, new_unit). unit can be “<u>” or “<u_num>/<u_den>”, where each u is in TimeUnit or DistanceUnit.

deepof.visuals_utils.get_square_shape_for_gridlike_plot(N): get best number of rows and columns for grid like plots

deepof.visuals_utils.plot_arena(coordinates: deepof_coordinates, center: str, color: str, ax: Any, key: str, roi_number: int | None = None)

Plot the arena in the given canvas.

Parameters:

coordinates (coordinates) – deepof Coordinates object.
center (str) – Name of the body part to which the positions will be centered. If false, the raw data is returned; if ‘arena’ (default), coordinates are centered in the pitch.
color (str) – color of the displayed arena.
ax (Any) – axes where to plot the arena.
str (key) – key of the animal to plot with optional “all of them” (if key==”average”).
int (roi_number) – number of a roi, if given

deepof.visuals_utils.heatmap(dframe: DataFrame, bodyparts: List, xlim: tuple | None = None, ylim: tuple | None = None, title: str | None = None, mask: ndarray | None = None, extrapolate_heatmap: bool = True, save: str = False, dpi: int = 200, ax: Any | None = None, **kwargs) → figure

Return a heatmap of the movement of a specific bodypart in the arena.

If more than one bodypart is passed, it returns one subplot for each.

Parameters:

dframe (pandas.DataFrame) – table_dict value with info to plot bodyparts (List): bodyparts to represent (at least 1).
bodyparts (list) – list of body parts to plot.
xlim (float) – limits of the x-axis.
ylim (float) – limits of the y-axis.
title (str) – title of the figure.
mask (np.ndarray) – mask to apply to the heatmap across time.
extrapolate_heatmap (bool) – Show full heatmap including extrapolated parts (default = True)
save (str) – if provided, saves the figure to the specified file.
dpi (int) – dots per inch of the figure to create.
ax (plt.AxesSubplot) – axes where to plot the current figure. If not provided, new figure will be created.
kwargs – additional arguments to pass to the seaborn kdeplot function.

Returns:

figure with the specified characteristics

Return type:

heatmaps (plt.figure)

deepof.visuals_utils.process_df(df: DataFrame, error_bars: str = 'sem')

Process binned behavioral DF independent of number of exp conditions.

Returns:

mean_values (dict[str, np.ndarray]) – Mapping condition -> array of mean values per time_bin (sorted by time_bin).
error_values (dict[str, np.ndarray]) – Mapping condition -> array of error values per time_bin (sorted by time_bin).
binned_effect_sizes_df (pd.DataFrame) – Pairwise effect sizes (Cohen’s d) for all condition pairs per time_bin. Columns: [“time_bin”,”cond_a”,”cond_b”,”Absolute_Cohens_d”,”Effect_Size_Category”] Empty if <2 conditions.
time_bins (np.ndarray) – Sorted unique time_bin values used for the arrays.
conditions (list[str]) – Sorted unique exp_condition values (keys of the dicts).

deepof.visuals_utils.plot_binned_line(ax, x, y, yerr=None, hide_time_bins=None, color='C0', label=None, smooth_points_per_interval: int = 10, mean_linewidth: float = 3.0, mean_alpha: float = 0.8, err_linewidth: float = 1.0, err_alpha: float = 0.15, marker: str = 'o')

Plot a binned mean line with interpolation + markers + error band, leaving gaps for hidden bins and NaNs.

Parameters:

ax (matplotlib axis)
x (array-like, shape (n_bins,)) – X positions (must be strictly increasing).
y (array-like, shape (n_bins,)) – Mean values per bin.
yerr (array-like or None, shape (n_bins,)) – Error values per bin (sem/std). If None, no error band is drawn.
hide_time_bins (array-like of bool or None, shape (n_bins,)) – True bins will be hidden (gaps in line, no marker/error there).
color (str)
label (str)
smooth_points_per_interval (int) – Number of points per bin-to-bin interval for mean interpolation (>=2).

deepof.visuals_utils.ensure_axis(ax=None, polar_depiction=False, figsize=(12, 4))

If ax is None: create proper axis and return (fig, ax, show=True) If ax is given:

if polar_depiction=True and ax is not polar, convert it in-place

return (ax.figure, ax, show=False)

deepof.visuals_utils.get_binned_geometry(bin_lengths): Returns a dict with centers/widths/edges in radians (0..2π) + labels “1..N”.

deepof.visuals_utils.format_time_binned_axis(ax, geom, polar_depiction, max_value, title=None, xlabel=None, ylabel=None)

deepof.visuals_utils.add_polar_bin_labels(ax, geom, radius_factor=1.05): Call after histogram so rmax is final.

deepof.visuals_utils.plot_binned_groups(ax, x_radians, mean_values, error_values, condition_values, hide_time_bins, colors, plot_binned_line_func): Plots mean +/- error for each condition using your existing plot_binned_line. Returns (handles, max_value).

deepof.visuals_utils.plot_effectsize_histogram(ax, geom, effect_size_categories, hide_time_bins, max_value, bottom, show_histogram=True, cmap=('#9370DB', '#6A5ACD', '#4B0082'), hidden_color='#C0C0C0', alpha=0.8): Draws effect size histogram bars. Returns (legend_handles, stat_text_col).

deepof.visuals_utils.annotate_binwise_stats(ax, test_dict, geom, polar_depiction, text_color='k')

deepof.visuals_utils.add_binned_legends(ax, condition_handles, condition_labels, effect_handles=None, polar_depiction=False, show_histogram=True, first_plot=True): Adds condition legend + effect-size legend with consistent placement. Only adds legends if first_plot=True.

deepof.clustering.censNetConv_pt module

Translation of the CensNetConv module (from spektral,pip install spektral version 1.3.1) into Pytorch

Original module can be imported with

from spektral.layers import CensNetConv

if the library “spektral” is installed. Based on the paper

“Co-embedding of Nodes and Edges with Graph Neural Networks” by Xiaodong Jiang, Ronghang Zhu, Pengsheng Ji, and Sheng Li 2020 arXiv:2010.13242v1 [cs.LG] 25 Oct 2020

class deepof.clustering.censNetConv_pt.CensNetConvPT(node_channels: int, edge_channels: int, activation: str | None = None, use_bias: bool = True)

Bases: Module

A PyTorch implementation of the CensNet convolutional layer.

__init__(node_channels: int, edge_channels: int, activation: str | None = None, use_bias: bool = True): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(inputs)

The forward pass of the CensNet layer.

Parameters:: inputs – A tuple containing: - node_features (torch.Tensor): Shape [B, N, F_n] - graph_ops (tuple): (laplacian, edge_laplacian, incidence) - edge_features (torch.Tensor): Shape [B, E, F_e]

static preprocess(adjacency: Tensor)

Computes the graph operators needed for the forward pass.

Parameters:: adjacency (torch.Tensor) – Adjacency matrix, shape [B, N, N] or [N, N].
Returns:: (laplacian, edge_laplacian, incidence)
Return type:: A tuple

deepof.clustering.censNetConv_pt.degree_power_pt(A: Tensor, k: float) → Tensor

Computes D^k from the given adjacency matrix A.

Parameters:

A (torch.Tensor) – A dense adjacency matrix of shape [N, N].
k (float) – The exponent.

Returns:

A diagonal matrix D^k of shape [N, N].

Return type:

torch.Tensor

deepof.clustering.censNetConv_pt.normalized_adjacency_pt(A: Tensor, symmetric: bool = True) → Tensor

Normalizes the given adjacency matrix.

Parameters:

A (torch.Tensor) – A dense adjacency matrix of shape [N, N].
symmetric (bool) – If True, computes D^-0.5 * A * D^-0.5. Otherwise, computes D^-1 * A.

Returns:

The normalized adjacency matrix.

Return type:

torch.Tensor

deepof.clustering.censNetConv_pt.gcn_filter_pt(A: Tensor, symmetric: bool = True) → Tensor

Computes the GCN filter (I + D^-0.5 * A * D^-0.5).

Parameters:

A (torch.Tensor) – A dense adjacency matrix or a batch of them. Shape [N, N] or [B, N, N].
symmetric (bool) – Whether to use symmetric normalization.

Returns:

The GCN-normalized adjacency matrix.

Return type:

torch.Tensor

deepof.clustering.censNetConv_pt.line_graph_pt(incidence: Tensor) → Tensor

Creates the line graph adjacency matrix.

Parameters:: incidence (torch.Tensor) – The incidence matrix of shape […, N, E].
Returns:: The line graph adjacency matrix of shape […, E, E].
Return type:: torch.Tensor

deepof.clustering.censNetConv_pt.incidence_matrix_pt(adjacency: Tensor) → Tensor

Creates incidence matrices for a batch of graphs.

Parameters:: adjacency (torch.Tensor) – Adjacency matrix, shape [N, N] or [B, N, N].
Returns:: Incidence matrices, shape [N, E] or [B, N, E].
Return type:: torch.Tensor

deepof.clustering.censNetConv_pt.transpose_pt(a: Tensor, perm=None) → Tensor: Transposes a tensor, handling sparsity.

deepof.clustering.censNetConv_pt.reshape_pt(a: Tensor, shape: tuple) → Tensor: Reshapes a tensor, handling sparsity.

deepof.clustering.censNetConv_pt.dot_pt(a: Tensor, b: Tensor) → Tensor: Computes a @ b, handling sparsity. torch.matmul is a powerful replacement that handles most cases: - dense @ dense (rank 2 and 3/batch) - sparse @ dense - dense @ sparse

deepof.clustering.censNetConv_pt.mixed_mode_dot_pt(a: Tensor, b: Tensor) → Tensor: Computes einsum(‘ij,bjk->bik’, a, b), handling sparsity. PyTorch equivalent of ops.mixed_mode_dot.

deepof.clustering.censNetConv_pt.modal_dot_pt(a: Tensor, b: Tensor, transpose_a=False, transpose_b=False): Computes matrix multiplication, handling different data modes automatically. PyTorch equivalent of ops.modal_dot.

deepof.clustering.dataset module

deepof.clustering.dataset.reorder_and_reshape(data: ndarray) → ndarray

class deepof.clustering.dataset.BatchDictDataset(preprocessed_dict: Dict, dataset_folder: str, dataset_name: str, force_rebuild: bool = False, h5_chunk_len: int | None = None, return_angles: bool | None = False, supervised_dict: Dict | None = None, read_only: bool = False)

Bases: object

__init__(preprocessed_dict: Dict, dataset_folder: str, dataset_name: str, force_rebuild: bool = False, h5_chunk_len: int | None = None, return_angles: bool | None = False, supervised_dict: Dict | None = None, read_only: bool = False)

make_loader(batch_size: int, shuffle: bool = True, drop_last: bool = False, num_workers: int = 0, pin_memory: bool = True, iterable_for_h5: bool = True, rdcc_nbytes: int = 67108864, rdcc_nslots: int = 1000000, block_shuffle: bool = True, permute_within_block: bool = False, prefetch_factor: int = 4, persistent_workers: bool | None = None, seed: int | None = None, ddp_shard: bool = True, bootstrap_training: bool = False, bootstrap_block_len: int = 250) → DataLoader

deepof.clustering.logging module

logging and user-feedback functionality for models

deepof.clustering.logging.get_q_vade(model: Module, x: Tensor, a: Tensor) → Tensor

Extracts soft cluster assignments q(c|z) from VaDE.

Assumes model(x, a, return_gmm_params=True) returns a tuple whose third element is the cluster responsibility matrix of shape [B, K].

deepof.clustering.logging.get_q_vqvae(model: Module, x: Tensor, a: Tensor, *, distill_head: Module) → Tensor: Extracts soft cluster assignments for VQVAE by applying the distillation head to the pre-quantization encoder output.

deepof.clustering.logging.get_q_contrastive(model: Module, x_full: Tensor, a_full: Tensor, *, distill_head: Module, edge_index: Tensor) → Tensor: Extracts soft cluster assignments for the contrastive model by reproducing the main-window embedding path used for distillation during training.

deepof.clustering.logging.compute_vade_specific_diagnostics(model: Module) → Dict[str, float]

Computes VaDE-specific diagnostics related to the latent GMM.

Returns an empty dict for models without a latent_space GMM.

deepof.clustering.logging.compute_diagnostics(model: Module, dataloader: DataLoader, q_fn: Callable[[Module, Tensor, Tensor], Tensor], device: device, n_components: int, tau_star: Tensor | None = None, distill_sharpen_T: float = 0.5, distill_conf_weight: bool = False, distill_conf_thresh: float = 0.55, max_batches: int = 4, extra_stats_fn: Callable[[Module], Dict[str, float]] | None = None) → Dict[str, float]

Computes clustering diagnostics and alignment score from model soft assignments.

The function is model-agnostic: it only requires a q_fn that extracts soft assignments q of shape [B, K] from a batch.

Args: model (nn.Module): Model to evaluate. dataloader (DataLoader): Validation or analysis loader. q_fn (Callable): Function that maps (model, x, a) -> q of shape [B, K]. device (torch.device): Device for inference. n_components (int): Number of clusters/components K. tau_star (Optional[torch.Tensor]): Teacher assignments of shape [N, K]. If provided,

the balance term is based on KL(q_marginal || tau_marginal). If None, the balance term falls back to normalized marginal entropy of q.

distill_sharpen_T (float): Temperature used when computing teacher-confidence diagnostics. distill_conf_weight (bool): Whether teacher confidence weighting is enabled. distill_conf_thresh (float): Threshold used for teacher confidence weighting diagnostics. max_batches (int): Maximum number of dataloader batches to inspect. extra_stats_fn (Optional[Callable]): Optional model-specific diagnostics helper.

Returns: Dict[str, float]: Diagnostics dictionary including conf_norm, bal_norm, and alignment_score.

deepof.clustering.logging.init_log_summary(model_name: str): Initialize distionary structure of losses to collect

deepof.clustering.logging.print_losses(model_name: str, log_summary: dict, epoch: int, n_epochs: int, train_logs: dict, val_logs: dict, klw: float = 0.0, lambda_d: float = 0.0, is_main: bool = True): Print losses neatly aligned and append them to the log summary.

deepof.clustering.logging.average_logs(logs_list: Iterable[Dict[str, float]]) → Dict[str, float]

deepof.clustering.logging.log_epoch_to_tensorboard(writer: object | None, train_logs: Dict[str, float], val_logs: Dict[str, float], epoch: int, score_value: float = nan, lambda_d: float = 0.0)

Writes per-epoch training and validation metrics to TensorBoard.

Parameters:

writer (Optional[SummaryWriter]) – TensorBoard writer. If None, this is a no-op.
train_logs (Dict[str, float]) – Training metrics for the current epoch.
val_logs (Dict[str, float]) – Validation metrics for the current epoch.
epoch (int) – Current epoch index.
score_value (float) – Current alignment score. Only logged if finite.
lambda_d (float) – Current distillation lambda weight.

deepof.clustering.losses module

Model losses

deepof.clustering.losses.select_contrastive_loss_pt(history: Tensor, future: Tensor, similarity: str, loss_fn: str = 'nce', temperature: float = 0.1, tau: float = 0.1, beta: float = 0.1, elimination_topk: float = 0.1) → Tuple[Tensor, Tensor, Tensor]

deepof.clustering.losses.nce_loss_pt_old(history: Tensor, future: Tensor, similarity: Callable[[Tensor, Tensor], Tensor], temperature: float = 0.1) → Tuple[Tensor, Tensor, Tensor]: Compute the NCE loss function, as described in the paper “A Simple Framework for Contrastive Learning of Visual Representations” (https://arxiv.org/abs/2002.05709).

deepof.clustering.losses.nce_loss_pt(history, future, similarity, temperature=0.1): Standard NCE loss

deepof.clustering.losses.dcl_loss_pt(history: Tensor, future: Tensor, similarity: Callable[[Tensor, Tensor], Tensor], temperature: float = 0.1, debiased: bool = True, tau_plus: float = 0.1) → Tuple[Tensor, Tensor, Tensor]: Compute the DCL loss function, as described in the paper “Debiased Contrastive Learning” (https://github.com/chingyaoc/DCL/).

deepof.clustering.losses.fc_loss_pt(history: Tensor, future: Tensor, similarity: Callable[[Tensor, Tensor], Tensor], temperature: float = 0.1, elimination_topk: float = 0.1) → Tuple[Tensor, Tensor, Tensor]: Compute the FC loss function, as described in the paper “Fully-Contrastive Learning of Visual Representations” (https://arxiv.org/abs/2004.11362).

deepof.clustering.losses.hard_loss_pt(history: Tensor, future: Tensor, similarity: Callable[[Tensor, Tensor], Tensor], temperature: float, beta: float = 0.0, debiased: bool = True, tau_plus: float = 0.1) → Tuple[Tensor, Tensor, Tensor]: Compute the Hard loss function, as described in the paper “Contrastive Learning with Hard Negative Samples” (https://arxiv.org/abs/2011.03343).

deepof.clustering.losses.compute_kmeans_loss_pt(latent_means: Tensor, weight: float) → Tensor

Computes a loss based on the singular values of the Gram matrix of the latent vectors, encouraging orthogonality. Based on https://arxiv.org/pdf/1610.04794.pdf, and https://www.biorxiv.org/content/10.1101/2020.05.14.095430v3.

Parameters:

latent_means – The latent vectors from the model (batch_size, latent_dim).
weight – The weight to apply to this loss component.

Returns:

The calculated scalar loss tensor.

class deepof.clustering.losses.Dynamic_weight_manager(n_batches_per_epoch: int, mode: str = 'sigmoid', warmup_epochs: int = 15, max_weight: float = 1.0, at_max_epochs: int = 0, cooldown_epochs: int = 15, end_weight: float = 1.0)

Bases: object

handles KL and lambda weights over epochs

__init__(n_batches_per_epoch: int, mode: str = 'sigmoid', warmup_epochs: int = 15, max_weight: float = 1.0, at_max_epochs: int = 0, cooldown_epochs: int = 15, end_weight: float = 1.0)

get_weight() → float

step()

deepof.clustering.losses.cluster_frequencies_regularizer(soft_counts: Tensor) → Tensor

class deepof.clustering.losses.VadeLoss(common_cfg: CommonFitCfg, vade_cfg: VaDECfg, teacher_cfg: TurtleTeacherCfg, kl_scheduler: Dynamic_weight_manager | None = None, l1_activity_weight: float = 0.1, gmm_logvar_clamp: Tuple[float, float] = (-8.0, 8.0))

Bases: Module

VaDE loss function combining reconstruction, KL divergence, clustering, and optional teacher distillation terms.

Supports two training phases (pretrain and main) with separate parameter sets for phase-dependent regularization terms. Call set_pretrain_mode() to switch between them.

Phase-dependent parameters (stored separately for pretrain and main): - repel_weight, repel_length_scale - nonempty_weight, nonempty_floor, nonempty_p

Phase-independent parameters are set once at construction and do not change between phases.

Parameters:

common_cfg (CommonFitCfg) – Common training configuration providing n_components, latent_dim, and kmeans_loss.
vade_cfg (VaDECfg) – VaDE-specific configuration including both pretrain and main parameter values.
teacher_cfg (TurtleTeacherCfg) – Teacher distillation configuration.
kl_scheduler (Optional[Dynamic_weight_manager]) – KL weight scheduler. Can be replaced later via set_kl_scheduler().
l1_activity_weight (float) – L1 regularization weight on z_log_var. Defaults to 0.1.
gmm_logvar_clamp (Tuple[float, float]) – Min and max clamp values for GMM log-variances. Defaults to (-8.0, 8.0).

__init__(common_cfg: CommonFitCfg, vade_cfg: VaDECfg, teacher_cfg: TurtleTeacherCfg, kl_scheduler: Dynamic_weight_manager | None = None, l1_activity_weight: float = 0.1, gmm_logvar_clamp: Tuple[float, float] = (-8.0, 8.0)): Initialize internal Module state, shared by both nn.Module and ScriptModule.

set_mode(mode: str): Copies the parameter set for the given phase into the active attributes.

set_teacher(tau_star: Tensor, lambda_distill: float = 1.0, lambda_scheduler: Dynamic_weight_manager | None = None)

Sets teacher assignments and distillation parameters.

Computes inverse-marginal class reweighting from the teacher distribution if distill_use_class_reweight is enabled.

Parameters:

tau_star (torch.Tensor) – Teacher soft assignments of shape [N, K].
lambda_distill (float) – Distillation loss weight. Defaults to 1.0.
lambda_scheduler (Optional[Dynamic_weight_manager]) – Optional scheduler for lambda_distill.

set_kl_scheduler(kl_scheduler: Dynamic_weight_manager | None = None): Replaces the KL weight scheduler and resets its iteration counter.

forward(model_outputs, x_original, batch_indices: Tensor | None = None)

Computes the full VaDE loss from model outputs and original inputs.

Parameters:

model_outputs – Tuple of (recon_dist, latent_z, q, kmeans_loss, z_mean, z_log_var, gmm_params).
x_original (torch.Tensor) – Original input tensor of shape [B, T, N, F].
batch_indices (Optional[torch.Tensor]) – Sample indices into tau_star for distillation.

Returns:

Dictionary of all individual loss terms and the total loss.

Return type:

Dict[str, torch.Tensor]

deepof.clustering.losses.build_optimizer_generic(model: Module, distill_head: Module | None = None, base_lr: float = 0.0003, weight_decay: float = 0.0001) → Optimizer

deepof.clustering.losses.build_optimizer_vade(model: Module, base_lr: float = 0.0003, gmm_lr: float = 0.0001) → Optimizer

deepof.clustering.model_utils_new module

Utility functions for training autoencoder models for functionality within deepof.clustering.

class deepof.clustering.model_utils_new.CommonFitCfg(learning_rate: float = 0.0003, model_name: str = 'VaDE', encoder_type: str = 'recurrent', batch_size: int = 1024, latent_dim: int = 6, epochs: int = 10, n_components: int = 10, output_path: str = '.', data_path: str = '.', log_history: bool = True, pretrained: str | None = None, save_weights: bool = True, run: int = 0, num_workers: int = 0, prefetch_factor: int = 0, use_amp: bool = False, interaction_regularization: float = 0.0, kmeans_loss: float = 0.0, diag_max_batches: int = 4, seed: int = None, limit_train_batches: int | None = 1000, limit_val_batches: int | None = 1000)

Bases: object

learning_rate: float = 0.0003

model_name: str = 'VaDE'

encoder_type: str = 'recurrent'

batch_size: int = 1024

latent_dim: int = 6

epochs: int = 10

n_components: int = 10

output_path: str = '.'

data_path: str = '.'

log_history: bool = True

pretrained: str | None = None

save_weights: bool = True

run: int = 0

num_workers: int = 0

prefetch_factor: int = 0

use_amp: bool = False

interaction_regularization: float = 0.0

kmeans_loss: float = 0.0

diag_max_batches: int = 4

seed: int = None

limit_train_batches: int | None = 1000

limit_val_batches: int | None = 1000

__init__(learning_rate: float = 0.0003, model_name: str = 'VaDE', encoder_type: str = 'recurrent', batch_size: int = 1024, latent_dim: int = 6, epochs: int = 10, n_components: int = 10, output_path: str = '.', data_path: str = '.', log_history: bool = True, pretrained: str | None = None, save_weights: bool = True, run: int = 0, num_workers: int = 0, prefetch_factor: int = 0, use_amp: bool = False, interaction_regularization: float = 0.0, kmeans_loss: float = 0.0, diag_max_batches: int = 4, seed: int | None = None, limit_train_batches: int | None = 1000, limit_val_batches: int | None = 1000) → None

class deepof.clustering.model_utils_new.TurtleTeacherCfg(use_turtle_teacher: bool = False, teacher_gamma: float = 8.0, teacher_outer_steps: int = 500, teacher_inner_steps: int = 100, teacher_normalize_feats: bool = True, teacher_head_temp: float = 0.35, teacher_task_temp: float = 0.35, teacher_alpha_sample_entropy: float = 2.0, lambda_distill: float = 4.0, lambda_decay_start: int = 10, lambda_end_weight: float = 0.2, lambda_cooldown: int = 10, distill_sharpen_T: float = 0.5, distill_conf_weight: bool = False, distill_conf_thresh: float = 0.3, generic_lambda_distill: float = 2.0, generic_distill_sharpen_T: float = 0.5, generic_distill_conf_weight: bool = True, generic_distill_conf_thresh: float = 0.6, generic_distill_warmup_epochs: int = 1, distill_class_reweight_beta: float = 1.0, distill_class_reweight_cap: float = 3.0, include_latent_view: bool = (True,), include_edges_view: bool = False, include_nodes_view: bool = True, include_angles_view: bool = False, pca_nodes_dim: int = 32, pca_edges_dim: int = 32, pca_angles_dim: int = 32, batch_size_nodes: int = 4096, batch_size_edges: int = 8192, batch_size_angles: int = 8192, teacher_refresh_every: int | None = None, teacher_freeze_at: int | None = 10, reinit_gmm_on_refresh: bool = False, teacher_batch_size: int = 2048)

Bases: object

use_turtle_teacher: bool = False

teacher_gamma: float = 8.0

teacher_outer_steps: int = 500

teacher_inner_steps: int = 100

teacher_normalize_feats: bool = True

teacher_head_temp: float = 0.35

teacher_task_temp: float = 0.35

teacher_alpha_sample_entropy: float = 2.0

lambda_distill: float = 4.0

lambda_decay_start: int = 10

lambda_end_weight: float = 0.2

lambda_cooldown: int = 10

distill_sharpen_T: float = 0.5

distill_conf_weight: bool = False

distill_conf_thresh: float = 0.3

generic_lambda_distill: float = 2.0

generic_distill_sharpen_T: float = 0.5

generic_distill_conf_weight: bool = True

generic_distill_conf_thresh: float = 0.6

generic_distill_warmup_epochs: int = 1

distill_class_reweight_beta: float = 1.0

distill_class_reweight_cap: float = 3.0

include_latent_view: bool = (True,)

include_edges_view: bool = False

include_nodes_view: bool = True

include_angles_view: bool = False

pca_nodes_dim: int = 32

pca_edges_dim: int = 32

pca_angles_dim: int = 32

batch_size_nodes: int = 4096

batch_size_edges: int = 8192

batch_size_angles: int = 8192

teacher_refresh_every: int | None = None

teacher_freeze_at: int | None = 10

reinit_gmm_on_refresh: bool = False

teacher_batch_size: int = 2048

__init__(use_turtle_teacher: bool = False, teacher_gamma: float = 8.0, teacher_outer_steps: int = 500, teacher_inner_steps: int = 100, teacher_normalize_feats: bool = True, teacher_head_temp: float = 0.35, teacher_task_temp: float = 0.35, teacher_alpha_sample_entropy: float = 2.0, lambda_distill: float = 4.0, lambda_decay_start: int = 10, lambda_end_weight: float = 0.2, lambda_cooldown: int = 10, distill_sharpen_T: float = 0.5, distill_conf_weight: bool = False, distill_conf_thresh: float = 0.3, generic_lambda_distill: float = 2.0, generic_distill_sharpen_T: float = 0.5, generic_distill_conf_weight: bool = True, generic_distill_conf_thresh: float = 0.6, generic_distill_warmup_epochs: int = 1, distill_class_reweight_beta: float = 1.0, distill_class_reweight_cap: float = 3.0, include_latent_view: bool = (True,), include_edges_view: bool = False, include_nodes_view: bool = True, include_angles_view: bool = False, pca_nodes_dim: int = 32, pca_edges_dim: int = 32, pca_angles_dim: int = 32, batch_size_nodes: int = 4096, batch_size_edges: int = 8192, batch_size_angles: int = 8192, teacher_refresh_every: int | None = None, teacher_freeze_at: int | None = 10, reinit_gmm_on_refresh: bool = False, teacher_batch_size: int = 2048) → None

class deepof.clustering.model_utils_new.VaDECfg(learning_rate_pretrain: float = 0.001, gmm_learning_rate: float = 0.001, pretrain_epochs: int = 10, reg_cat_clusters: float = 0.0, recluster: bool = False, freeze_gmm_epochs: int = 0.0, freeze_decoder_epochs: int = 0.0, prior_loss_weight: float = 0.0, reg_scatter_weight: float = 0.0, temporal_cohesion_weight: float = 0.0, reg_scatter_beta: float = 1.0, repel_weight: float = 0.0, repel_length_scale: float = 1.0, tf_cluster_weight: float = 0.0, nonempty_weight: float = 0.02, nonempty_p: float = 2.0, nonempty_floor_percent: float = 0.05, kmeans_loss_pretrain: float = 1.0, repel_weight_pretrain: float = 0.5, repel_length_scale_pretrain: float = 0.5, nonempty_weight_pretrain: float = 0.02, nonempty_p_pretrain: float = 2.0, nonempty_floor_percent_pretrain: float = 0.05, kl_annealing_mode: str = 'tf_sigmoid', kl_max_weight: float = 1.0, kl_warmup: int = 5, kl_end_weight: float = 0.2, kl_cooldown: int = 5, kl_annealing_mode_pretrain: str = 'tf_sigmoid', kl_max_weight_pretrain: float = 0.2, kl_warmup_pretrain: int = 15, kl_end_weight_pretrain: float = 0.2, kl_cooldown_pretrain: int = 10)

Bases: object

learning_rate_pretrain: float = 0.001

gmm_learning_rate: float = 0.001

pretrain_epochs: int = 10

reg_cat_clusters: float = 0.0

recluster: bool = False

freeze_gmm_epochs: int = 0.0

freeze_decoder_epochs: int = 0.0

prior_loss_weight: float = 0.0

reg_scatter_weight: float = 0.0

temporal_cohesion_weight: float = 0.0

reg_scatter_beta: float = 1.0

repel_weight: float = 0.0

repel_length_scale: float = 1.0

tf_cluster_weight: float = 0.0

nonempty_weight: float = 0.02

nonempty_p: float = 2.0

nonempty_floor_percent: float = 0.05

kmeans_loss_pretrain: float = 1.0

repel_weight_pretrain: float = 0.5

repel_length_scale_pretrain: float = 0.5

nonempty_weight_pretrain: float = 0.02

nonempty_p_pretrain: float = 2.0

nonempty_floor_percent_pretrain: float = 0.05

kl_annealing_mode: str = 'tf_sigmoid'

kl_max_weight: float = 1.0

kl_warmup: int = 5

kl_end_weight: float = 0.2

kl_cooldown: int = 5

kl_annealing_mode_pretrain: str = 'tf_sigmoid'

kl_max_weight_pretrain: float = 0.2

kl_warmup_pretrain: int = 15

kl_end_weight_pretrain: float = 0.2

kl_cooldown_pretrain: int = 10

__init__(learning_rate_pretrain: float = 0.001, gmm_learning_rate: float = 0.001, pretrain_epochs: int = 10, reg_cat_clusters: float = 0.0, recluster: bool = False, freeze_gmm_epochs: int = 0.0, freeze_decoder_epochs: int = 0.0, prior_loss_weight: float = 0.0, reg_scatter_weight: float = 0.0, temporal_cohesion_weight: float = 0.0, reg_scatter_beta: float = 1.0, repel_weight: float = 0.0, repel_length_scale: float = 1.0, tf_cluster_weight: float = 0.0, nonempty_weight: float = 0.02, nonempty_p: float = 2.0, nonempty_floor_percent: float = 0.05, kmeans_loss_pretrain: float = 1.0, repel_weight_pretrain: float = 0.5, repel_length_scale_pretrain: float = 0.5, nonempty_weight_pretrain: float = 0.02, nonempty_p_pretrain: float = 2.0, nonempty_floor_percent_pretrain: float = 0.05, kl_annealing_mode: str = 'tf_sigmoid', kl_max_weight: float = 1.0, kl_warmup: int = 5, kl_end_weight: float = 0.2, kl_cooldown: int = 5, kl_annealing_mode_pretrain: str = 'tf_sigmoid', kl_max_weight_pretrain: float = 0.2, kl_warmup_pretrain: int = 15, kl_end_weight_pretrain: float = 0.2, kl_cooldown_pretrain: int = 10) → None

class deepof.clustering.model_utils_new.ContrastiveCfg(temperature: float = 0.1, contrastive_similarity_function: str = 'cosine', contrastive_loss_function: str = 'nce', beta: float = 0.1, tau: float = 0.1, aug_min_shift: int = 1, aug_max_shift: int = 6, aug_p_shift: float = 0.8, aug_max_rot: int = 30, aug_n_rot: int = 4, aug_p_rot: float = 0.0, aug_max_interp: int = 8, aug_min_interp: int = 3, aug_p_interp: float = 0.3, aug_noise_sigma: float = 0.03, aug_p_noise: float = 0.0)

Bases: object

temperature: float = 0.1

contrastive_similarity_function: str = 'cosine'

contrastive_loss_function: str = 'nce'

beta: float = 0.1

tau: float = 0.1

aug_min_shift: int = 1

aug_max_shift: int = 6

aug_p_shift: float = 0.8

aug_max_rot: int = 30

aug_n_rot: int = 4

aug_p_rot: float = 0.0

aug_max_interp: int = 8

aug_min_interp: int = 3

aug_p_interp: float = 0.3

aug_noise_sigma: float = 0.03

aug_p_noise: float = 0.0

__init__(temperature: float = 0.1, contrastive_similarity_function: str = 'cosine', contrastive_loss_function: str = 'nce', beta: float = 0.1, tau: float = 0.1, aug_min_shift: int = 1, aug_max_shift: int = 6, aug_p_shift: float = 0.8, aug_max_rot: int = 30, aug_n_rot: int = 4, aug_p_rot: float = 0.0, aug_max_interp: int = 8, aug_min_interp: int = 3, aug_p_interp: float = 0.3, aug_noise_sigma: float = 0.03, aug_p_noise: float = 0.0) → None

deepof.clustering.model_utils_new.ddp_init_if_needed(backend: str = 'nccl')

deepof.clustering.model_utils_new.unwrap_dp(model: Module) → Module

deepof.clustering.model_utils_new.move_to(x, device)

deepof.clustering.model_utils_new.save_model_info(ckpt_path: str, *, stage: str, epoch: int | None = None, train_steps: int | None = None, val_total: float | None = None, score_value: float | None = None, extra: dict | None = None, common_cfg=None, teacher_cfg=None, vade_cfg=None, contrastive_cfg=None, model: Module | None = None, log_summary: Dict[str, Any] | None = None, rebuild_spec: Dict[str, Any] | None = None, save_weights: bool = True) → None: Saves all config and training information for a freshly trained model (+ optionally the model weights).

deepof.clustering.model_utils_new.recompute_edges(x: Tensor, edge_index: Tensor) → Tensor

Recompute edge distances from node coordinates.

Returns:

(B, T, E, 1) where a[…, e, 0] is the Euclidean distance between the: two nodes specified by edge_index[e].

Return type:

a

deepof.clustering.model_utils_new.ckpt_paths(model_name: str, common_cfg: CommonFitCfg)

Checks and validates main input parameters for model training.

Parameters:

model_name (str) – Name of the model
encoder_type (str) – Type of encode-decoder pair being used
kl_annealing_mode (str) – Which function should be used to increase and decrease KL
contrastive_similarity_function (str) – Which function should be used to calculate similarity between sampels for the contrastive model
contrastive_loss_function (str) – Which function should be used to calculate the loss for the contrastive model

deepof.clustering.model_utils_new.embedding_per_video(coordinates: deepof_coordinates, to_preprocess: deepof_table_dict, model: str, meta_info: dict, supervised_annotations: deepof_table_dict | None = None, scale: str = 'standard', animal_id: str | None = None, extract_pair: list | None = None, global_scaler: Any | None = None, softcounts_extraction_method=None, embedding_gates: str = 'Center', states_per_gate: int = 8, quality_threshold: float = 0.75, frac_bps_below: float = 0.5, samples_max: int = 227272, M_gates: int = 3, n_micro: int = 200, lagtime: int = 3)

Use a previously trained model to produce embeddings and soft_counts per experiment in table_dict format.

Parameters:

coordinates (coordinates) – deepof.Coordinates object for the project at hand.
to_preprocess (table_dict) – dictionary with (merged) features to process.
model (tf.keras.models.Model) – trained deepof unsupervised model to run inference with.
metainfo (dict) – meta_nfo dictionary containing information regarding dataset preprocessing.
supervised_annotations (table_dict) – table dict with supervised annotations per experiment.
pretrained (bool) – whether to use the specified pretrained model to recluster the data.
scale (str) – The type of scaler to use within animals. Defaults to ‘standard’, but can be changed to ‘minmax’, ‘robust’, or False. Use the same that was used when training the original model.
animal_id (str) – if more than one animal is present, provide the ID(s) of the animal(s) to include.
global_scaler (Any) – trained global scaler produced when processing the original dataset.
softcounts_extraction_method (str) – Method used for softcounts extraction, can be None, “gmm”, “msm” (for msm-pcca) or “combined” for an approach that applies msm-pcca first, then filters out all samples with high tracking uncertainty and uses a gmm approach to predict separate clusters on the uncertain sampel fraction. If None, decoder of model is used. If model has no decoder, “msm” is used as a default.
distance_bp (str) – The mosue bodypart that will be used for distance binning during softcounts extraction. Only relevant for experiments with 2+ mice that use a not-none softcounts_extraction_method.
samples_max (int) – Maximum number of samples taken for plotting to avoid excessive computation times. If the number of rows in a data set exceeds this number the data is downsampled accordingly.

Returns:

embeddings per experiment. soft_counts (table_dict): soft_counts per experiment.

Return type:

embeddings (table_dict)

deepof.clustering.model_utils_new.slice_time_per_sample(x: Tensor, start: Tensor, length: int) → Tensor: Slice a per-sample contiguous window along time dim=1. Returns shape (B, length, …)

deepof.clustering.model_utils_new.print_model_info(ckpt_path: str) → None

If present, print the contents of a sibling info file:: [same_name_as_model_file]_info.txt

Example

/path/to/model.pt -> /path/to/model_info.txt

deepof.clustering.model_utils_new.load_model_from_ckpt(path: str, device=None, strict: bool = False): Load a single model checkpoint saved via save_model_info(…, save_bundle=True) using only the checkpoint path. Returns: model, log_summary, rebuild_spec, load_report

deepof.clustering.model_utils_new.load_best_checkpoints(model: Module, best_path_val: str, best_path_score: str, device: device, save_weights: bool) → Tuple[Module, Module]

Loads the best-val and best-score checkpoints into two separate model copies.

Returns the best-val model and best-score model, both unwrapped from DataParallel. If a checkpoint does not exist, the corresponding model retains its current weights.

Parameters:

model (nn.Module) – Current model (possibly DataParallel-wrapped).
best_path_val (str) – Path to the best-validation checkpoint.
best_path_score (str) – Path to the best-score checkpoint.
device (torch.device) – Device for loading weights.
save_weights (bool) – Whether weight saving was enabled during training.

Returns: Tuple[nn.Module, nn.Module]: (best_val_model, best_score_model), both unwrapped.

deepof.clustering.models_new module

deep autoencoder models for unsupervised pose detection.

VQ-VAE: a variational autoencoder with a vector quantization latent-space (https://arxiv.org/abs/1711.00937).
VaDE: a variational autoencoder with a Gaussian mixture latent-space.
Contrastive: an embedding model consisting of a single encoder, trained using a contrastive loss.

Models were translated from original tensorflow implementations to Pytorch using LLMs.

class deepof.clustering.models_new.RecurrentEncoderPT(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool = True, interaction_regularization: float = 0.0)

Bases: Module

PyTorch translation of the TF recurrent encoder.

Expected shapes:

use_gnn=True:
x: (B, T, N_nodes, F_per_node) a: (B, T, E_edges, F_per_edge)
Internally:
TF-style grouping reshape -> (B, N_nodes, T, F_per_node) / (B, E_edges, T, F_per_edge)

Recurrent blocks -> (B, N_nodes/E_edges, 2*latent_dim)

CensNetConvPT -> (B, N_nodes/E_edges, latent_dim)

ReLU (to match TF activation=’relu’)

Flatten+concat -> Linear(latent_dim)
use_gnn=False:
x: (B, T, N_nodes, F_per_node) -> flatten to (B, 1, T, N_nodes*F_per_node) recurrent block -> (B, 1, 2*latent_dim) -> squeeze -> Linear(latent_dim)

__init__(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool = True, interaction_regularization: float = 0.0): Initialize internal Module state, shared by both nn.Module and ScriptModule.

static tf_style_group_reshape(x: Tensor, groups: int, feat_per_group: int) → Tensor

Exact TF mapping used in the encoder for the GNN path:: x: (B, T, groups, feat_per_group) -> (B, groups, T, feat_per_group)

Derived from the TF sequence: transpose -> reshape -> transpose.

forward(x: Tensor, a: Tensor) → Tensor: x: (B, T, N_nodes, F_per_node) a: (B, T, E_edges, F_per_edge)

class deepof.clustering.models_new.RecurrentBlockPT(input_features: int, latent_dim: int, bidirectional_merge: str = 'concat')

Bases: Module

__init__(input_features: int, latent_dim: int, bidirectional_merge: str = 'concat'): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.RecurrentDecoderPT(output_shape: tuple, latent_dim: int, bidirectional_merge: str = 'concat')

Bases: Module

A full PyTorch implementation of the recurrent decoder.

__init__(output_shape: tuple, latent_dim: int, bidirectional_merge: str = 'concat'): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(g: Tensor, x: Tensor) → TransformedDistribution

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.TemporalBlockPT(in_channels: int, out_channels: int, kernel_size: int, dilation: int, padding: str = 'causal', dropout_rate: float = 0.0, activation: str = 'relu', use_batch_norm: bool = True, conv_init_std: float = 0.05)

Bases: Module

Residual TCN block compatible with keras-tcn:

Conv1d -> BN(eps=1e-3) -> Act -> Drop
Conv1d -> BN(eps=1e-3) -> Act -> Drop
Residual add (with 1x1 projection if channels differ) -> Act

Returns:: post-residual activation skip: post-second-conv activation (summed across blocks when skip connections are used)
Return type:: out

__init__(in_channels: int, out_channels: int, kernel_size: int, dilation: int, padding: str = 'causal', dropout_rate: float = 0.0, activation: str = 'relu', use_batch_norm: bool = True, conv_init_std: float = 0.05): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tuple[Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.TCN1DPT(in_channels: int, conv_filters: int = 32, kernel_size: int = 4, conv_stacks: int = 2, conv_dilations: Iterable[int] = (1, 2, 4, 8), padding: str = 'causal', use_skip_connections: bool = True, dropout_rate: float = 0.0, activation: str = 'relu', use_batch_norm: bool = True, return_sequences: bool = False)

Bases: Module

Temporal Convolutional Network over sequences (B, T, C_in). - When use_skip_connections=True: sum per-block skip outputs, then apply a final activation. - Otherwise: use the last block’s residual output. - return_sequences=False: returns last timestep features (B, C_out).

__init__(in_channels: int, conv_filters: int = 32, kernel_size: int = 4, conv_stacks: int = 2, conv_dilations: Iterable[int] = (1, 2, 4, 8), padding: str = 'causal', use_skip_connections: bool = True, dropout_rate: float = 0.0, activation: str = 'relu', use_batch_norm: bool = True, return_sequences: bool = False): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.BatchNorm1dKerasFP32(num_features, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)

Bases: BatchNorm1d

__init__(num_features, eps=0.001, momentum=0.01, affine=True, track_running_stats=True): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.TCNEncoderPT(input_shape: Tuple[int, int, int], edge_feature_shape: Tuple[int, int, int], adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool = True, conv_filters: int = 32, kernel_size: int = 4, conv_stacks: int = 2, conv_dilations: Iterable[int] = (1, 2, 4, 8), padding: str = 'causal', use_skip_connections: bool = True, dropout_rate: float = 0.0, activation: str = 'relu', interaction_regularization: float = 0.0, use_batch_norm: bool = True)

Bases: Module

Builds a neural network that can be used to encode motion tracking instances into a vector. Each layer contains a residual block with a convolutional layer and a skip connection. See the following paper for more details: https://arxiv.org/pdf/1803.01271.pdf

Inputs:
x: (B, W, N, NF) node features a: (B, W, E, EF) edge features

use_gnn=True:
TimeDistributed(TCN) over nodes/edges -> (B, N, C) and (B, E, C) CensNetConvPT([node, (lap, edge_lap, inc), edge]) -> (B, N, latent), (B, E, latent) Flatten and MLP head

use_gnn=False:
Flatten nodes+features -> TCN -> MLP head

__init__(input_shape: Tuple[int, int, int], edge_feature_shape: Tuple[int, int, int], adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool = True, conv_filters: int = 32, kernel_size: int = 4, conv_stacks: int = 2, conv_dilations: Iterable[int] = (1, 2, 4, 8), padding: str = 'causal', use_skip_connections: bool = True, dropout_rate: float = 0.0, activation: str = 'relu', interaction_regularization: float = 0.0, use_batch_norm: bool = True): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, a: Tensor) → Tensor: x: (B, W, N, NF) a: (B, W, E, EF) -> returns (B, latent_dim)

class deepof.clustering.models_new.AffineTransformedDistribution(base_distribution, transform)

Bases: TransformedDistribution

A specific TransformedDistribution for Affine transforms that implements .mean.

__init__(base_distribution, transform)

property mean: Computes the mean of the transformed distribution. E[loc + scale * X] = loc + scale * E[X]

class deepof.clustering.models_new.ProbabilisticDecoderPT(hidden_dim: int, data_dim: int)

Bases: Module

PyTorch translation of the ProbabilisticDecoder, including scaling transform. AMP-safe version: do distribution math in float32, sanitize NaNs/Infs.

__init__(hidden_dim: int, data_dim: int): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden: Tensor, validity_mask: Tensor) → AffineTransformedDistribution

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.TCNDecoderPT(output_shape: Tuple[int, int], latent_dim: int, conv_filters: int = 64, kernel_size: int = 4, conv_stacks: int = 1, conv_dilations: Iterable[int] = (8, 4, 2, 1), padding: str = 'causal', use_skip_connections: bool = True, dropout_rate: float = 0.0, activation: str = 'relu', use_batch_norm: bool = True)

Bases: Module

Builds a neural network that can be used to decode a latent space into a sequence of motion tracking instances. Each layer contains a residual block with a convolutional layer and a skip connection. See the following paper for more details: https://arxiv.org/pdf/1803.01271.pdf

g: (B, latent_dim)

x: (B, W, NNF) or (B, W, N, NF) for mask computation

Pipeline:
Dense(latent) -> BN -> Dense(2*latent, relu) -> BN -> Dense(4*latent, relu) -> BN -> RepeatVector(W) -> TCN(return_sequences=True) -> ProbabilisticDecoderPT(hidden_dim=conv_filters, data_dim=NNF)

Returns: a distribution whose .mean is (B, W, NNF)

__init__(output_shape: Tuple[int, int], latent_dim: int, conv_filters: int = 64, kernel_size: int = 4, conv_stacks: int = 1, conv_dilations: Iterable[int] = (8, 4, 2, 1), padding: str = 'causal', use_skip_connections: bool = True, dropout_rate: float = 0.0, activation: str = 'relu', use_batch_norm: bool = True): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(g: Tensor, x: Tensor): g: (B, latent_dim) x: (B, W, NNF) or (B, W, N, NF) -> used only to compute validity mask returns: distribution with .mean of shape (B, W, NNF)

deepof.clustering.models_new.sinusoidal_positional_encoding(max_len: int, d_model: int, device=None, dtype=torch.float32) → Tensor: Compute positional encodings, as in https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.

class deepof.clustering.models_new.MultiHeadAttentionPT(in_dim: int, num_heads: int, key_dim: int, dropout: float = 0.0)

Bases: Module

Multi-head attention using PyTorch’s optimized scaled_dot_product_attention.

__init__(in_dim: int, num_heads: int, key_dim: int, dropout: float = 0.0): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, attn_mask: Tensor | None = None) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.TransformerEncoderLayerPT(key_dim: int, num_heads: int, dff: int, rate: float = 0.1)

Bases: Module

Transformer encoder layer with post-normalization. Based on https://www.tensorflow.org/text/tutorials/transformer.

__init__(key_dim: int, num_heads: int, dff: int, rate: float = 0.1): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, attn_mask: Tensor | None = None) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.TransformerCorePT(in_channels: int, key_dim: int, num_layers: int, num_heads: int, dff: int, max_pos: int, rate: float = 0.1)

Bases: Module

Core transformer: Linear embedding -> positional encoding -> transformer layers.

__init__(in_channels: int, key_dim: int, num_layers: int, num_heads: int, dff: int, max_pos: int, rate: float = 0.1): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor

Parameters:: x – (B, T, in_channels)
Returns:: (B, T, key_dim) if return_sequences else (B, key_dim)

class deepof.clustering.models_new.TFMEncoderPT(input_shape: Tuple[int, int, int], edge_feature_shape: Tuple[int, int, int], adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool = True, num_layers: int = 2, num_heads: int = 4, dff: int = 128, dropout_rate: float = 0.1, key_dim: int | None = None)

Bases: Module

Based on https://www.tensorflow.org/text/tutorials/transformer. Adapted according to https://academic.oup.com/gigascience/article/8/11/giz134/5626377 and https://arxiv.org/abs/1711.03905.

__init__(input_shape: Tuple[int, int, int], edge_feature_shape: Tuple[int, int, int], adjacency_matrix: ndarray, latent_dim: int, use_gnn: bool = True, num_layers: int = 2, num_heads: int = 4, dff: int = 128, dropout_rate: float = 0.1, key_dim: int | None = None): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, a: Tensor) → Tensor

Parameters:

x – (B, W, N, NF) - node features over time
a – (B, W, E, EF) - edge features over time

Returns:

(B, latent_dim) - encoded representation

class deepof.clustering.models_new.TFMDecoderPT(output_shape: Tuple[int, int], latent_dim: int, num_layers: int = 2, num_heads: int = 4, dff: int = 128, dropout_rate: float = 0.1)

Bases: Module

Based on https://www.tensorflow.org/text/tutorials/transformer. Adapted according to https://academic.oup.com/gigascience/article/8/11/giz134/5626377?login=true and https://arxiv.org/abs/1711.03905.

Transformer decoder that FORCES latent usage by concatenating latent to every timestep, not using cross-attention.

__init__(output_shape: Tuple[int, int], latent_dim: int, num_layers: int = 2, num_heads: int = 4, dff: int = 128, dropout_rate: float = 0.1): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(g: Tensor, x_target: Tensor, training: bool | None = None)

Parameters:

g – (B, latent_dim) - latent code from encoder
x_target – (B, W, D_in) - target sequence (only used for validity mask)

class deepof.clustering.models_new.CausalSelfAttentionLayer(d_model: int, num_heads: int, dff: int, dropout: float = 0.1)

Bases: Module

Causal self-attention layer (no cross-attention).

__init__(d_model: int, num_heads: int, dff: int, dropout: float = 0.1): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.VectorQuantizerPT(n_components: int, embedding_dim: int, beta: float, kmeans_loss: float = 0.0)

Bases: Module

Quantizes the input vectors into a fixed number of clusters using L2 norm. Based on https://arxiv.org/pdf/1509.03700.pdf, and adapted for clustering using https://arxiv.org/abs/1806.02199. Implementation based on https://keras.io/examples/generative/vq_vae/.

__init__(n_components: int, embedding_dim: int, beta: float, kmeans_loss: float = 0.0): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, return_losses: bool = True)

Parameters:: x – Input tensor of shape (…, embedding_dim) Typically (batch_size, embedding_dim) from encoder

get_code_indices(flattened_inputs: Tensor, return_soft_counts: bool = False) → Tensor

deepof.clustering.models_new.init_encoder_decoder(*, encoder_type: str, input_shape, edge_feature_shape, adjacency_matrix, latent_dim: int, use_gnn: bool, interaction_regularization: float, n_nodes: int, n_features_per_node: int, time_steps: int)

Initialize encoder/decoder modules based on encoder type.

Parameters:

encoder_type – can be either “recurrent”, “TCN”, or “transformer”.
input_shape – forwarded to encoders.
edge_feature_shape – forwarded to encoders.
adjacency_matrix – forwarded to encoders.
latent_dim – forwarded to encoders.
use_gnn – forwarded to encoders.
interaction_regularization – forwarded to encoders.
n_nodes – used to compute decoder output shape.
n_features_per_node – used to compute decoder output shape.
time_steps – used to compute decoder output shape.

Returns:

(encoder, decoder)

class deepof.clustering.models_new.VQVAEPT(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, n_components: int, encoder_type: str = 'recurrent', use_gnn: bool = True, kmeans_loss: float = 0.0, interaction_regularization: float = 0.0, beta: float = 1.0)

Bases: Module

PyTorch implementation of the VQ-VAE model adapted to the DeepOF setting.

Note: This version handles the actual DeepOF input format where: - x: (B, T, node_features) - flattened node features - a: (B, T, edge_features) - flattened edge features

__init__(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, n_components: int, encoder_type: str = 'recurrent', use_gnn: bool = True, kmeans_loss: float = 0.0, interaction_regularization: float = 0.0, beta: float = 1.0)

Initialize a VQ-VAE model.

Parameters:

input_shape (tuple) – Shape of the input (time_steps, node_features).
edge_feature_shape (tuple) – Shape of edge features (time_steps, edge_features).
adjacency_matrix (np.ndarray) – Adjacency matrix for GNN.
latent_dim (int) – Dimensionality of the latent space.
n_components (int) – Number of embeddings (clusters) in the codebook.
beta (float) – Beta parameter of the VQ loss.
kmeans_loss (float) – Regularization parameter for the Gram matrix.
use_gnn (bool) – Whether to use GNN in encoder.
encoder_type (str) – Type of encoder (“recurrent”, “TCN”, or “transformer”).
interaction_regularization (float) – Regularization parameter for interactions.

forward(x: Tensor, a: Tensor, return_losses: bool = True, return_all_outputs: bool = False)

Forward pass through the VQ-VAE model.

Parameters:

x – Input node features (B, T, N, F)
a – Input edge features (B, T, E, F_edge)
return_losses – Whether to compute and return VQ losses
return_all_outputs – Whether to return all intermediate outputs

encode(x: Tensor, a: Tensor) → Tensor: Inference-only: Get encoder output. Equivalent to TF ‘encoder’ model.

class deepof.clustering.models_new.ClusterControlPT

Bases: Module

Calculates clustering metrics. This is a pass-through layer for the main latent vector z, returning it unmodified alongside a dictionary of metrics.

__init__(): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z: Tensor, z_cat: Tensor) → Tuple[Tensor, Dict[str, Tensor]]

Calculates metrics and passes the latent vector z through.

Parameters:

z – The latent vector (batch_size, latent_dim).
z_cat – Cluster probabilities (batch_size, n_components).

Returns:

A tuple containing the unmodified z and a dictionary of metrics.

class deepof.clustering.models_new.GaussianMixtureLatentPT(input_dim: int, n_components: int, latent_dim: int, kmeans: float, lens_enabled: bool = False, **kwargs)

Bases: Module

PyTorch implementation of the Gaussian Mixture probabilistic latent space model. It embeds data into a latent space and models that space as a mixture of Gaussians. Implementation based on VaDE (https://arxiv.org/abs/1611.05148) and VaDE-SC (https://openreview.net/forum?id=RQ428ZptQfU)

__init__(input_dim: int, n_components: int, latent_dim: int, kmeans: float, lens_enabled: bool = False, **kwargs): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, epsilon: Tensor | None = None) → Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class deepof.clustering.models_new.VaDEPT(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, n_components: int, encoder_type: str = 'recurrent', use_gnn: bool = True, kmeans_loss: float = 1.0, interaction_regularization: float = 0.0, lens_enabled=False)

Bases: Module

A self-contained PyTorch implementation of the VaDE model.

__init__(input_shape: tuple, edge_feature_shape: tuple, adjacency_matrix: ndarray, latent_dim: int, n_components: int, encoder_type: str = 'recurrent', use_gnn: bool = True, kmeans_loss: float = 1.0, interaction_regularization: float = 0.0, lens_enabled=False): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, a: Tensor, return_gmm_params: bool = False)

Returns:

reconstruction_dist
latent
categorical
kmeans_loss
z_mean
z_log_var
gmm_params (dict) [if return_gmm_params=True]

property get_gmm_params: dict: Returns the GMM parameters from the latent space.

set_pretrain_mode(pretrain_on: bool): Sets the pretrain flag in the latent space.

initialize_gmm_from_data(data_loader, n_samples=10000): Runs the autoencoder part of the model over the data to get embeddings, then fits a scikit-learn GMM to initialize the latent space.

embed(x: Tensor, a: Tensor) → Tensor

Inference-only method to get the latent embedding.

Parameters:

x (torch.Tensor) – Input node features tensor.
a (torch.Tensor) – Input edge features tensor.

Returns:

The latent representation z.

Return type:

torch.Tensor

group(x: Tensor, a: Tensor) → Tensor

Inference-only method to get cluster probabilities.

Parameters:

x (torch.Tensor) – Input node features tensor.
a (torch.Tensor) – Input edge features tensor.

Returns:

The soft cluster assignments (categorical probabilities).

Return type:

torch.Tensor

class deepof.clustering.models_new.ContrastivePT(input_shape: Tuple[int, int, int], edge_feature_shape: Tuple[int, int, int], adjacency_matrix, latent_dim: int = 8, encoder_type: str = 'TCN', use_gnn: bool = True, temperature: float = 0.1, similarity_function: str = 'cosine', loss_function: str = 'nce', beta: float = 0.1, tau: float = 0.1, interaction_regularization: float = 0.0)

Bases: Module

PyTorch port of the TF Contrastive model.

Inputs:

x: (B, T, N, F) a: (B, T, E, F_edge)

Behavior:

Builds an encoder for sequences of length T//2.
forward(x_half, a_half) returns embeddings (B, D) for a given half-window.
compute_loss(x_full, a_full) slices pos/neg windows and returns (loss, pos_mean, neg_mean, debug).

__init__(input_shape: Tuple[int, int, int], edge_feature_shape: Tuple[int, int, int], adjacency_matrix, latent_dim: int = 8, encoder_type: str = 'TCN', use_gnn: bool = True, temperature: float = 0.1, similarity_function: str = 'cosine', loss_function: str = 'nce', beta: float = 0.1, tau: float = 0.1, interaction_regularization: float = 0.0): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, a: Tensor) → Tensor

Encode a half-window:: x: (B, T_half, N, F), a: (B, T_half, E, Fe) -> (B, D)

deepof.clustering.teacher_model module

Turtle teacher model for training support

deepof.clustering.teacher_model.soft_cross_entropy_logits(logits, soft_targets, eps=1e-08, reduction='mean')

class deepof.clustering.teacher_model.TurtleHeads(feature_dims: List[int], n_components: int, inner_lr: float = 0.1, M: int = 100, weight_decay: float = 0.0001, normalize_feats: bool = True, temperature: float = 0.7)

Bases: Module

Trains each linear head for each view to resemble the target tau (the lienar fit based on all views) i.e. tries to get as close as possible to prediciting the separation within the full dataset (at the current batch) based on each view of the dataset.

Parameters:

feature_dims (List[int]) – List containing the feature dimensionality of each active view.
n_components (int) – Number of output clusters.
inner_lr (float) – Learning rate used for the per-head inner optimization. Defaults to 0.1.
M (int) – Number of inner optimization steps performed for the heads at each outer step. Defaults to 100.
weight_decay (float) – Weight decay used for the head optimizers. Defaults to 1e-4.
normalize_feats (bool) – If True, L2-normalizes each input feature vector before fitting or inference. Defaults to True.
temperature (float) – Temperature used to scale the logits produced by each head. Defaults to 0.7.

__init__(feature_dims: List[int], n_components: int, inner_lr: float = 0.1, M: int = 100, weight_decay: float = 0.0001, normalize_feats: bool = True, temperature: float = 0.7): Initialize internal Module state, shared by both nn.Module and ScriptModule.

inner_fit(feats_list, soft_targets)

Trains each linear head for each view to resemble the target tau (the lienar fit based on all views)

Parameters:

feats_list (List[torch.Tensor]) – List of feature tensors, one per view, each of shape [B, D_i].
soft_targets (torch.Tensor) – Soft target assignments of shape [B, K], typically produced by the task encoder.

Returns:

None

logits_list(feats_list): Computes detached logits for each view-specific head.

class deepof.clustering.teacher_model.TaskEncoder(feature_dims: List[int], n_components: int, temperature: float = 1.0)

Bases: Module

Applies linear projections (fully connected layers) to each individual view and sums up the result

Parameters:

feature_dims (List[int]) – List containing the feature dimensionality of each active view.
n_components (int) – Number of output clusters.
temperature (float) – Temperature used to scale the per-view logits before averaging and softmax. Defaults to 1.0.

__init__(feature_dims: List[int], n_components: int, temperature: float = 1.0): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(feats_list)

Computes soft cluster assignments from the available views.

Parameters:: feats_list (List[torch.Tensor]) – List of feature tensors, one per view, each of shape [B, D_i], D_i may vary based on view type.
Returns:: Soft assignments of shape [B, n_clusters], obtained by averaging projected logits across views and applying softmax.
Return type:: torch.Tensor

class deepof.clustering.teacher_model.TurtleTeacher(feature_dims: List[int], n_components: int, gamma: float = 10.0, alpha_sample_entropy: float = 0.1, inner_lr: float = 0.1, inner_steps: int = 100, head_wd: float = 0.0001, head_temp: float = 0.5, task_temp: float = 0.5, normalize_feats: bool = True, lr_theta: float = 0.005, delta_death_barrier: float = 40.0, device: str = 'cpu')

Bases: Module

Teacher model that learns soft cluster assignments τ which are easy to predict from each individual view using linear heads, while regularizers encourage confident assignments and balanced cluster usage.

Based on “Let Go of Your Labels with Unsupervised Transfer” by Artyom Gadetsky et al., see https://arxiv.org/abs/2406.07236

Parameters:

feature_dims (List[int]) – List containing the feature dimensionality of each active view.
n_components (int) – Number of output clusters.
gamma (float) – Strength of the marginal entropy penalty that encourages balanced cluster usage. Defaults to 10.0.
alpha_sample_entropy (float) – Weight of the per-sample entropy term that encourages confident assignments. Defaults to 0.1.
inner_lr (float) – Learning rate for the per-view heads during inner optimization. Defaults to 0.1.
inner_steps (int) – Number of inner optimization steps used to fit the per-view heads at each outer step. Defaults to 100.
head_wd (float) – Weight decay used for the per-view head optimizers. Defaults to 1e-4.
head_temp (float) – Temperature used for the per-view head logits. Defaults to 0.5.
task_temp (float) – Temperature used for the task encoder logits. Defaults to 0.5.
normalize_feats (bool) – If True, L2-normalizes features before passing them to the per-view heads. Defaults to True.
lr_theta (float) – Learning rate for the task encoder optimizer. Defaults to 5e-3.
delta_death_barrier (float) – Strength of the penalty discouraging dead or weakly used clusters. Defaults to 40.0.
device (str) – Device on which the teacher should operate. Defaults to “cpu”.

__init__(feature_dims: List[int], n_components: int, gamma: float = 10.0, alpha_sample_entropy: float = 0.1, inner_lr: float = 0.1, inner_steps: int = 100, head_wd: float = 0.0001, head_temp: float = 0.5, task_temp: float = 0.5, normalize_feats: bool = True, lr_theta: float = 0.005, delta_death_barrier: float = 40.0, device: str = 'cpu'): Initialize internal Module state, shared by both nn.Module and ScriptModule.

to(device: device)

Move and/or cast the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters:

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns:

self

Return type:

Module

Examples:

>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

predict(loader) → Tensor

Runs a sequential pass over the data to compute assignments for the whole dataset. Used to generate the final tau_star without loading everything into GPU RAM at once. :param loader: DataLoader yielding batches of views. :type loader: DataLoader

Returns:: Tensor of shape [N, K] containing the soft assignments for the full dataset.
Return type:: torch.Tensor

fit(loader, outer_steps: int = 200, rho: float = 0.04, verbose: bool = True)

Fits the teacher by alternating between per-view head updates and task encoder updates on mini-batches.

At each outer step, the task encoder produces a batch of soft assignments tau. The per-view heads are then fitted to match tau, after which the task encoder is updated so that tau becomes easier to predict from each view while remaining confident and well-distributed across clusters.

Args: loader (DataLoader): DataLoader yielding batches of feature tensors, one tensor per active view. outer_steps (int): Number of outer optimization steps for the task encoder. Defaults to 200. rho (float): Weight of the optional batch-local smoothness regularizer between neighboring rows in a batch. Defaults to 0.04. verbose (bool): If True, prints optimization statistics during fitting. Defaults to True.

Returns: None

deepof.clustering.teacher_model.extract_latents(model: Module, dataset: BatchDictDataset, device: device, batch_size: int = 512, num_workers: int = 0) → Tensor

deepof.clustering.teacher_model.initialize_gmm_from_teacher(model: Module, z_all: Tensor, tau_star: Tensor, min_var: float = 0.0001, min_mass: float = 1e-06) → None

Compute GMM parameters from teacher assignments:: μ_c = sum_i τ*_ic u_i / sum_i τ*_ic σ^2_c = sum_i τ*_ic (u_i - μ_c)^2 / sum_i τ*_ic π_c = sum_i τ*_ic / N

Writes directly into model.latent_space.{gmm_means, gmm_log_vars, prior}.

Parameters:

model (nn.Module) – Model whose latent-space GMM parameters should be initialized.
z_all (torch.Tensor) – Latent representations of shape [N_samples, D], typically extracted from the training set.
tau_star (torch.Tensor) – Teacher soft assignments of shape [N_samples, C], where C is the number of mixture components.
min_var (float) – Minimum variance value used to clamp estimated cluster variances for numerical stability. Defaults to 1e-4.
min_mass (float) – Small constant added to cluster masses to avoid division by zero during estimation. Defaults to 1e-6.

Returns:

None

deepof.clustering.teacher_model.fit_nodes_pca(dataset: BatchDictDataset, n_components_pos: int = 32, n_components_spd: int = 32, batch_size: int = 4096, num_workers: int = 0, max_samples: int | None = None)

Fits two IncrementalPCAs:

one on positions (x,y) only
one on speeds (the 3rd channel per node)

Assumes dataset returns x with shape [B, T, N, F] where F>=3 and channel 0,1 are (x,y), channel 2 is speed.

Parameters:

dataset (BatchDictDataset) – Dataset providing node features and adjacency information.
n_components_pos (int) – Number of PCA components to retain for flattened position features. Defaults to 32.
n_components_spd (int) – Number of PCA components to retain for flattened speed features. Defaults to 32.
batch_size (int) – Batch size used for the two-pass IncrementalPCA fitting and transformation. Defaults to 4096.
num_workers (int) – Number of worker processes used by the dataset loader. Defaults to 0.
max_samples (Optional[int]) – Maximum number of samples to use during PCA fitting. If None, uses all samples. Defaults to None.

Returns:

ipca_pos: Fitted IncrementalPCA object for flattened position features. feats_pos_all: Tensor of shape [N_samples, n_components_pos] containing PCA-transformed position features. ipca_spd: Fitted IncrementalPCA object for flattened speed features. feats_spd_all: Tensor of shape [N_samples, n_components_spd] containing PCA-transformed speed features.

Return type:

Tuple[IncrementalPCA, torch.Tensor, IncrementalPCA, torch.Tensor]

deepof.clustering.teacher_model.fit_angles_pca(dataset_with_angles: BatchDictDataset, n_components: int = 32, batch_size: int = 8192, num_workers: int = 0) → Tuple[IncrementalPCA, Tensor]

Fits IncrementalPCA on angle tensors and returns both the fitted ipca and features. Mirrors fit_nodes_pca but for angle data.

Parameters:

dataset_with_angles (BatchDictDataset) – Dataset providing precomputed angle tensors.
n_components (int) – Number of PCA components to retain for the flattened angle features. Defaults to 32.
batch_size (int) – Batch size used for the two-pass IncrementalPCA fitting and transformation. Defaults to 8192.
num_workers (int) – Number of worker processes used by the dataset loader. Defaults to 0.

Returns:

ipca: Fitted IncrementalPCA object for the angle features. feats_all: Tensor of shape [N_samples, n_components] containing PCA-transformed angle features.

Return type:

Tuple[IncrementalPCA, torch.Tensor]

deepof.clustering.teacher_model.extract_pca_edges_view(dataset: BatchDictDataset, n_components: int = 16, batch_size: int = 8192, num_workers: int = 0, max_samples: int | None = None) → Tensor

Returns PCA features [N, n_components] for all samples’ edge tensor ‘a’ (T, E, F_edge), in order (shuffle=False), using two passes: partial_fit, then transform.

Parameters:

dataset (BatchDictDataset) – Dataset providing node and edge features.
n_components (int) – Number of PCA components to retain for the flattened edge features. Defaults to 16.
batch_size (int) – Batch size used for the two-pass IncrementalPCA fitting and transformation. Defaults to 8192.
num_workers (int) – Number of worker processes used by the dataset loader. Defaults to 0.
max_samples (Optional[int]) – Maximum number of samples to use during PCA fitting. If None, uses all samples. Defaults to None.

Returns:

Tensor of shape [N_samples, n_components] containing PCA-transformed edge features.

Return type:

torch.Tensor

deepof.clustering.teacher_model.run_turtle_teacher_on_views(views_dict: dict, n_components: int, gamma: float = 6.0, alpha_sample_entropy: float = 1.0, outer_steps: int = 200, inner_steps: int = 200, normalize_feats: bool = True, verbose: bool = True, device: device | None = None, head_temp: float = 0.3, task_temp: float = 0.3, batch_size: int = 2048) → Tuple[Any, Tensor]

Fits a TURTLE teacher on a set of precomputed views and returns the final soft assignments.

The input views are packed into a TensorDataset, trained in shuffled mini-batches, and then evaluated sequentially to produce tau_star.

Parameters:

views_dict (dict) – Dictionary mapping view names to tensors of shape [N, D]. Entries with value None are ignored.
n_components (int) – Number of output components or clusters.
gamma (float) – Strength of the marginal entropy penalty encouraging balanced cluster usage. Defaults to 6.0.
alpha_sample_entropy (float) – Weight of the per-sample entropy term encouraging confident assignments. Defaults to 1.0.
outer_steps (int) – Number of outer optimization steps for the task encoder. Defaults to 200.
inner_steps (int) – Number of inner optimization steps for the per-view heads at each outer step. Defaults to 200.
normalize_feats (bool) – If True, L2-normalizes features before passing them to the per-view heads. Defaults to True.
verbose (bool) – If True, prints training progress during fitting. Defaults to True.
device (Optional[torch.device]) – Device on which the teacher should be trained. Defaults to CPU if None.
head_temp (float) – Temperature used for the per-view head logits. Defaults to 0.3.
task_temp (float) – Temperature used for the task encoder logits. Defaults to 0.3.
batch_size (int) – Batch size used for training and prediction. Defaults to 2048.

Returns:

teacher: Fitted TURTLE teacher object. tau_star: Tensor of shape [N, K] containing the final soft assignments in dataset order.

Return type:

Tuple[Any, torch.Tensor]

class deepof.clustering.teacher_model.DiscriminativeHead(latent_dim: int, n_components: int)

Bases: Module

Simple linear head on top of z (latent) to predict C clusters (logits).

Parameters:

latent_dim (int) – Dimensionality of the latent input vectors.
n_components (int) – Number of output components or clusters.

__init__(latent_dim: int, n_components: int): Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z: Tensor) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

deepof.clustering.teacher_model.maybe_build_turtle_teacher(*, teacher_cfg: TurtleTeacherCfg, common_cfg: CommonFitCfg, train_dataset: BatchDictDataset, preprocessed_train: dict, data_path: str, device: device, latent_view: Tensor | None = None) → Tuple[Any | None, Tensor | None, Dict[str, Any]]

Builds the requested teacher views, fits the TURTLE teacher if enabled, and returns the resulting teacher together with final soft assignments.

Depending on the configuration, this function can include a latent view, PCA-based node, edge, and angle views, and a supervised label view. All constructed views are returned so they can be reused later.

Parameters:

teacher_cfg (TurtleTeacherCfg) – Configuration controlling which views are used and how the teacher is trained.
common_cfg (CommonFitCfg) – Common training configuration, including the number of components.
train_dataset (BatchDictDataset) – Training dataset used to extract or derive teacher views.
preprocessed_train (dict) – Preprocessed training data used when rebuilding angle-based datasets.
data_path (str) – Path to the underlying dataset files.
device (torch.device) – Device on which the teacher should be trained.
latent_view (Optional[torch.Tensor]) – Optional latent representation computed externally, for example by another model. Required if include_latent_view is enabled.

Returns:

teacher: Fitted TURTLE teacher object, or None if the teacher is disabled. tau_star: Final soft assignments of shape [N, K], or None if the teacher is disabled. views: Dictionary containing the constructed views and any fitted PCA objects for reuse.

Return type:

Tuple[Optional[Any], Optional[torch.Tensor], Dict[str, Any]]

deepof.clustering.training module

Training functionality for all models

class deepof.clustering.training.StepResult(loss: torch.Tensor, logs: Dict[str, float])

Bases: object

loss: Tensor

logs: Dict[str, float]

__init__(loss: Tensor, logs: Dict[str, float]) → None

deepof.clustering.training.train_one_epoch_indexed(model: Module, model_name: str, dataloader: DataLoader, optimizer: Optimizer, step_fn: Callable[[Module, Any, SimpleNamespace], StepResult], device: device, epoch: int, num_epochs: int, scaler: GradScaler | None = None, use_amp: bool = True, grad_clip_value: float | None = 0.75, ctx: SimpleNamespace | None = None, show_progress: bool = True, leave: bool = False) → Dict[str, float]

deepof.clustering.training.validate_one_epoch_indexed(model: Module, dataloader: DataLoader, step_fn: Callable[[Module, Any, SimpleNamespace], StepResult], device: device, epoch: int, num_epochs: int, ctx: SimpleNamespace | None = None, show_progress: bool = True, leave: bool = False) → Dict[str, float]

deepof.clustering.training.step_vade(model: Module, batch: Tuple[Tensor, Tensor, Tensor | None], ctx: SimpleNamespace) → StepResult

deepof.clustering.training.step_vqvae_distill(model: Module, batch: Tuple[Tensor, Tensor, Tensor], ctx: SimpleNamespace) → StepResult

deepof.clustering.training.label_separation_score(embeddings: Tensor, labels: Tensor, pos_thr: float = 0.5, neg_thr: float = 0.5, min_pos: int = 2, min_neg: int = 2, normalize_embeddings: bool = True, eps: float = 1e-08) → Tensor

Returns ONE scalar score per batch (higher = better separation). [currently unused]

For each behavior/label l:: positives: y[:,l] >= pos_thr negatives: y[:,l] <= neg_thr ignore ambiguous values in between

Score_l = ||mu_pos - mu_neg||^2 / (within_dispersion + eps) Final score = weighted average over valid labels (weighted by used sample count).

If no label has enough pos & neg samples, returns 0.0.

deepof.clustering.training.step_contrastive_distill(model: Module, batch: Tuple[Tensor, Tensor, Tensor], ctx: SimpleNamespace) → StepResult

deepof.clustering.training.train_deepof_model(preprocessed_object: Tuple[dict, dict] | None = None, adjacency_matrix: ndarray | None = None, meta_info: dict | None = None, encoder_type: str | None = None, batch_size: int | None = None, latent_dim: int | None = None, epochs: int | None = None, output_path: str | None = None, n_clusters: int = 10, learning_rate: float = 0.001, log_history: bool = True, data_path: str = '.', pretrained: str | None = None, save_weights: bool = True, run: int = 0, reg_cat_clusters: float = 0.0, recluster: bool = False, freeze_gmm_epochs: int = 0, freeze_decoder_epochs: int = 0, prior_loss_weight: float = 0.0, gmm_learning_rate: float = 0.001, learning_rate_pretrain: float = 0.001, interaction_regularization: float = 0.0003, kmeans_loss: float = 0.0, num_workers: int = 0, prefetch_factor: int = 0, use_amp: bool = False, use_turtle_teacher: bool = True, teacher_gamma: float = 8.0, teacher_outer_steps: int = 500, teacher_inner_steps: int = 100, teacher_normalize_feats: bool = True, lambda_distill: float = 4.0, lambda_decay_start: int = 10, lambda_end_weight: float = 0.2, lambda_cooldown: int = 10, teacher_refresh_every: int | None = False, teacher_freeze_at: int | None = 10, teacher_head_temp: float = 0.5, teacher_task_temp: float = 0.5, teacher_alpha_sample_entropy: float = 2.0, teacher_batch_size: int = 2048, pretrain_epochs: int = 10, kmeans_loss_pretrain: float = 1.0, repel_weight_pretrain: float = 0.5, repel_length_scale_pretrain: float = 0.5, nonempty_weight_pretrain: float = 0.02, nonempty_p_pretrain: float = 2.0, nonempty_floor_percent_pretrain: float = 0.05, kl_annealing_mode: str = 'tf_sigmoid', kl_max_weight: float = 1, kl_warmup: int = 5, kl_end_weight: float = 0.2, kl_cooldown: int = 5, kl_annealing_mode_pretrain: str = 'tf_sigmoid', kl_max_weight_pretrain: float = 0.2, kl_warmup_pretrain: int = 15, kl_end_weight_pretrain: float = 0.2, kl_cooldown_pretrain: int = 10, reg_scatter_weight: float = 0, temporal_cohesion_weight: float = 0, reg_scatter_beta: float = 1.0, repel_weight: float = 0, repel_length_scale: float = 1.0, main_clustering_loss: float = 0.0, nonempty_weight: float = 0.02, nonempty_floor_percent: float = 0.05, nonempty_p: float = 2.0, distill_conf_weight: bool = False, distill_conf_thresh: float = 0.3, distill_sharpen_T: float = 0.5, include_edges_view: bool = False, include_nodes_view: bool = True, pca_nodes_dim: int = 32, pca_edges_dim: int = 32, include_angles_view: bool = False, pca_angles_dim: int = 32, reinit_gmm_on_refresh: bool = False, diag_max_batches: int = 4, model_name: str = 'VaDE', generic_lambda_distill: float = 2.0, generic_distill_sharpen_T: float = 0.5, generic_distill_conf_weight: bool = True, generic_distill_conf_thresh: float = 0.6, generic_distill_warmup_epochs: int = 1, distill_class_reweight_beta: float = 1, distill_class_reweight_cap: float = 3, temperature: float = 0.1, contrastive_similarity_function: str = 'cosine', contrastive_loss_function: str = 'nce', beta: float = 0.1, tau: float = 0.1, aug_min_shift: int = 1, aug_max_shift: int = 3, aug_p_shift: int = 0.4, aug_max_rot: int = 30, aug_n_rot: int = 3, aug_p_rot: int = 0.8, aug_max_interp: int = 8, aug_min_interp: int = 3, aug_p_interp: float = 0.4, aug_noise_sigma: float = 0.03, aug_p_noise: float = 0.4, device: str | None = None, h5_dataset_folder: str | None = None, bootstrap_training: bool | None = False, bootstrap_block_len: int = 250, random_seed: int = 0) → Tuple[Module, Module, Module | None]

deepof.clustering.training.train_deepof_model_base(preprocessed_object: Tuple[dict, dict], adjacency_matrix: ndarray, meta_info: dict, common_cfg: CommonFitCfg, teacher_cfg: TurtleTeacherCfg, vade_cfg: VaDECfg, contrastive_cfg: ContrastiveCfg, h5_dataset_folder: str | None = None, shuffle: bool = True, device: str | None = None, bootstrap_training: bool = False, bootstrap_block_len: int = 250) → Tuple[Module, Module, Module | None]

deepof.clustering.training.fit_VQVAE(train_loader: DataLoader, val_loader: DataLoader, preprocessed_train: dict, adjacency_matrix: ndarray, common_cfg: CommonFitCfg, teacher_cfg: TurtleTeacherCfg, writer: SummaryWriter, device: device = device(type='cpu'), trial: Trial | None = None)

deepof.clustering.training.fit_contrastive(train_loader: DataLoader, val_loader: DataLoader, preprocessed_train: dict, adjacency_matrix: ndarray, meta_info: dict, common_cfg: CommonFitCfg, teacher_cfg: TurtleTeacherCfg, contrastive_cfg: ContrastiveCfg, writer: SummaryWriter, device: device = device(type='cpu'), trial: Trial | None = None)

deepof.clustering.training.fit_VADE(train_loader: DataLoader, val_loader: DataLoader, preprocessed_train: dict, adjacency_matrix: ndarray, common_cfg: CommonFitCfg, teacher_cfg: TurtleTeacherCfg, vade_cfg: VaDECfg, writer: SummaryWriter, device: device = device(type='cpu'), trial: Trial | None = None)

class deepof.clustering.training.RotationPrecomp(triplets: torch.Tensor, centers: torch.Tensor, branches_a: List[torch.Tensor], branches_c: List[torch.Tensor], prefer_side: torch.Tensor)

Bases: object

triplets: Tensor

centers: Tensor

branches_a: List[Tensor]

branches_c: List[Tensor]

prefer_side: Tensor

__init__(triplets: Tensor, centers: Tensor, branches_a: List[Tensor], branches_c: List[Tensor], prefer_side: Tensor) → None

deepof.clustering.training.build_rotation_precomp(edge_index: Tensor, n_nodes: int, device: device) → RotationPrecomp

Build triplets and per-triplet branch node sets ONCE.

This runs Python/CPU graph logic once, but stores results as CUDA tensors (if device is CUDA) so the augmentation step doesn’t do BFS anymore.

deepof package

Submodules

deepof.annotation_utils module

deepof.arena_utils module

deepof.config module

deepof.data_loading module

deepof.data_manager module

deepof.export_video module

deepof.data module

deepof.post_hoc module

deepof.post_hoc module

deepof.utils module

deepof.visuals module

deepof.visuals_utils module

deepof.clustering.censNetConv_pt module

deepof.clustering.dataset module

deepof.clustering.logging module

deepof.clustering.losses module

deepof.clustering.model_utils_new module

deepof.clustering.models_new module

deepof.clustering.teacher_model module

deepof.clustering.training module

Module contents