{ "cells": [ { "cell_type": "markdown", "id": "537ca473", "metadata": {}, "source": [ "# Working with large datasets in Deepof" ] }, { "cell_type": "markdown", "id": "cf37e3d4", "metadata": {}, "source": [ "(For the test version, this tutorial is currently only attached to the custom labels tutorial (to make them run with out automatized tests without having to restructure a ton of stuff). It will stand on its own later)" ] }, { "cell_type": "markdown", "id": "099eeb84", "metadata": {}, "source": [ "##### What we'll cover:\n", " \n", "* How to process large datasets (several hours of recording per video) with deepof\n", "* Things to consider when working with large quantities of data" ] }, { "cell_type": "code", "execution_count": 1, "id": "86dc41cc", "metadata": {}, "outputs": [], "source": [ "# # If using Google colab, uncomment and run this cell and the one below to set up the environment\n", "# # Note: because of how colab handles the installation of local packages, this cell will kill your runtime.\n", "# # This is not an error! Just continue with the cells below.\n", "# import os\n", "# !git clone -q https://github.com/mlfpm/deepof.git\n", "# !pip install -q -e deepof --progress-bar off\n", "# os.chdir(\"deepof\")\n", "# !curl --output tutorial_files.zip https://datashare.mpcdf.mpg.de/s/Hu1XjZkY9zml0mm/download\n", "# !unzip tutorial_files.zip" ] }, { "cell_type": "code", "execution_count": 2, "id": "9c6f7c77", "metadata": {}, "outputs": [], "source": [ "# import os\n", "# os.chdir(\"deepof\")\n", "# import os, warnings\n", "# warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "id": "3a5d1b7b", "metadata": {}, "source": [ "We start with importing the usual packages" ] }, { "cell_type": "code", "execution_count": 3, "id": "7fd307fa", "metadata": {}, "outputs": [], "source": [ "import copy\n", "import os\n", "import numpy as np\n", "import pickle\n", "import deepof.data" ] }, { "cell_type": "markdown", "id": "5938b6ee", "metadata": {}, "source": [ "And plotting gear" ] }, { "cell_type": "code", "execution_count": 4, "id": "ea45a21d", "metadata": {}, "outputs": [], "source": [ "from IPython import display\n", "from networkx import Graph, draw\n", "import deepof.visuals\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "id": "0230a554", "metadata": {}, "source": [ "### Brief introduction to large dataset analysis" ] }, { "cell_type": "markdown", "id": "6d625e1a", "metadata": {}, "source": [ "In general, analysing very large datasets works, on a user level, almost the same as analyzing small datasets. The main difference is that\n", "1. Things take longer\n", "2. Tables do not stay loaded in the RAM, but instead get saved and loaded in the background as tehy are needed\n", "\n", "The latter leads to the ubiquitous Table dictionaries in deepof now carrying only links to file locations instead of the tables themselves. " ] }, { "cell_type": "markdown", "id": "24fd106a", "metadata": {}, "source": [ "For this tutorial we do not provide an additional \"big\" dataset (as we do not want you to wait several hours for a downlaod). Instead we are going to load the same sample_project as in the unsupervised tutorial. Then we simply manually set the \"_very_large_project\" project-variable to True, which will cause the project to behave just like it would if it had videos with multiple hours long of recordings. \n", "\n", "If you create your own project with a larger dataset (i.e. at least one video that is about 4 hours long or longer) this variable will simply be automatically set to True during project set up. " ] }, { "cell_type": "code", "execution_count": 5, "id": "5ace4fb1", "metadata": {}, "outputs": [], "source": [ "# We load our small sample project, then we pretend that it is a big one\n", "my_deepof_project = deepof.data.load_project(\"./tutorial_files/sample_project\")\n", "my_deepof_project.load_exp_conditions(\"./tutorial_files/tutorial_exp_conditions.csv\")\n", "my_deepof_project._very_large_project=True" ] }, { "cell_type": "markdown", "id": "0f0e6a84", "metadata": {}, "source": [ "Now let's create some supervised annotations" ] }, { "cell_type": "code", "execution_count": 6, "id": "799e648a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "data preprocessing : 100%|██████████| 4/4 [00:11<00:00, 2.92s/step, step=Loading kinematics]\n", "supervised annotations : 100%|██████████| 6/6 [00:12<00:00, 2.10s/table, step=post processing] \n" ] } ], "source": [ "supervised_annotation = my_deepof_project.supervised_annotation()" ] }, { "cell_type": "markdown", "id": "769aa8cd", "metadata": {}, "source": [ "If we now try to have a look at one of our tables that we created we notice a difference: Instead of seeing the table, deepof now only displays the path to the file in which thsi table was stored." ] }, { "cell_type": "code", "execution_count": 7, "id": "d25daf35", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'duckdb_file': './tutorial_files\\\\sample_project\\\\Tables\\\\20191204_Day2_SI_JB08_Test_54\\\\database.duckdb',\n", " 'table': 't_20191204_Day2_SI_JB08_Test_54_supervised_annotations'}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "supervised_annotation['20191204_Day2_SI_JB08_Test_54']" ] }, { "cell_type": "markdown", "id": "631743fe", "metadata": {}, "source": [ "If we want to load this table we can access it with the \"get_dt\" function and a syntax very similar to calling a dictionary entry" ] }, { "cell_type": "code", "execution_count": 8, "id": "b6558c12", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | B_W_nose2nose | \n", "B_W_sidebyside | \n", "B_W_sidereside | \n", "B_W_nose2tail | \n", "W_B_nose2tail | \n", "B_W_nose2body | \n", "W_B_nose2body | \n", "B_W_following | \n", "W_B_following | \n", "B_climb-arena | \n", "... | \n", "W_stat-lookaround | \n", "W_stat-active | \n", "W_stat-passive | \n", "W_moving | \n", "W_sniffing | \n", "W_distance | \n", "W_cum-distance | \n", "W_speed | \n", "B_missing | \n", "W_missing | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000 | \n", "0.0000 | \n", "0.00000 | \n", "0 | \n", "0 | \n", "
| 1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000 | \n", "0.0000 | \n", "0.00000 | \n", "0 | \n", "0 | \n", "
| 2 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000 | \n", "0.0000 | \n", "0.00000 | \n", "0 | \n", "0 | \n", "
| 3 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000 | \n", "0.0000 | \n", "0.00000 | \n", "0 | \n", "0 | \n", "
| 4 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "4.140 | \n", "4.1400 | \n", "103.45860 | \n", "0 | \n", "0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 14994 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.070 | \n", "42787.8236 | \n", "1.74930 | \n", "0 | \n", "0 | \n", "
| 14995 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.067 | \n", "42787.8906 | \n", "1.67433 | \n", "0 | \n", "0 | \n", "
| 14996 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.051 | \n", "42787.9416 | \n", "1.27449 | \n", "0 | \n", "0 | \n", "
| 14997 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.037 | \n", "42787.9786 | \n", "0.92463 | \n", "0 | \n", "0 | \n", "
| 14998 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.025 | \n", "42788.0036 | \n", "0.62475 | \n", "0 | \n", "0 | \n", "
14999 rows × 33 columns
\n", "