Shortcuts

Supported Datasets

The supported datasets are listed above. We provide shell scripts for data preparation under the path $MMACTION2/tools/data/. Below is the detailed tutorials of data deployment for each dataset.

ActivityNet

Introduction

@article{Heilbron2015ActivityNetAL,
  title={ActivityNet: A large-scale video benchmark for human activity understanding},
  author={Fabian Caba Heilbron and Victor Escorcia and Bernard Ghanem and Juan Carlos Niebles},
  journal={2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2015},
  pages={961-970}
}

For basic dataset information, please refer to the official website. For action detection, you can either use the ActivityNet rescaled feature provided in this repo or extract feature with mmaction2 (which has better performance). We release both pipeline. Before we start, please make sure that current working directory is $MMACTION2/tools/data/activitynet/.

Option 1: Use the ActivityNet rescaled feature provided in this repo

Step 1. Download Annotations

First of all, you can run the following script to download annotation files.

bash download_feature_annotations.sh

Step 2. Prepare Videos Features

Then, you can run the following script to download activitynet features.

bash download_features.sh

Step 3. Process Annotation Files

Next, you can run the following script to process the downloaded annotation files for training and testing. It first merges the two annotation files together and then separates the annoations by train, val and test.

python process_annotations.py

Option 2: Extract ActivityNet feature using MMAction2 with all videos provided in official website

Step 1. Download Annotations

First of all, you can run the following script to download annotation files.

bash download_annotations.sh

Step 2. Prepare Videos

Then, you can run the following script to prepare videos. The codes are adapted from the official crawler. Note that this might take a long time.

bash download_videos.sh

Since some videos in the ActivityNet dataset might be no longer available on YouTube, official website has made the full dataset available on Google and Baidu drives. To accommodate missing data requests, you can fill in this request form provided in official download page to have a 7-day-access to download the videos from the drive folders.

We also provide download steps for annotations from BSN repo

bash download_bsn_videos.sh

For this case, the downloading scripts update the annotation file after downloading to make sure every video in it exists.

Step 3. Extract RGB and Flow

Before extracting, please refer to install.md for installing denseflow.

Use following scripts to extract both RGB and Flow.

bash extract_frames.sh

The command above can generate images with new short edge 256. If you want to generate images with short edge 320 (320p), or with fix size 340x256, you can change the args --new-short 256 to --new-short 320 or --new-width 340 --new-height 256. More details can be found in data_preparation

Step 4. Generate File List for ActivityNet Finetuning

With extracted frames, you can generate video-level or clip-level lists of rawframes, which can be used for ActivityNet Finetuning.

python generate_rawframes_filelist.py

Step 5. Finetune TSN models on ActivityNet

You can use ActivityNet configs in configs/recognition/tsn to finetune TSN models on ActivityNet. You need to use Kinetics models for pretraining. Both RGB models and Flow models are supported.

Step 6. Extract ActivityNet Feature with finetuned ckpts

After finetuning TSN on ActivityNet, you can use it to extract both RGB and Flow feature.

python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_train_video.txt --output-prefix ../../../data/ActivityNet/rgb_feat --modality RGB --ckpt /path/to/rgb_checkpoint.pth

python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_val_video.txt --output-prefix ../../../data/ActivityNet/rgb_feat --modality RGB --ckpt /path/to/rgb_checkpoint.pth

python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_train_video.txt --output-prefix ../../../data/ActivityNet/flow_feat --modality Flow --ckpt /path/to/flow_checkpoint.pth

python tsn_feature_extraction.py --data-prefix ../../../data/ActivityNet/rawframes --data-list ../../../data/ActivityNet/anet_val_video.txt --output-prefix ../../../data/ActivityNet/flow_feat --modality Flow --ckpt /path/to/flow_checkpoint.pth

After feature extraction, you can use our post processing scripts to concat RGB and Flow feature, generate the 100-t X 400-d feature for Action Detection.

python activitynet_feature_postprocessing.py --rgb ../../../data/ActivityNet/rgb_feat --flow ../../../data/ActivityNet/flow_feat --dest ../../../data/ActivityNet/mmaction_feat

Final Step. Check Directory Structure

After the whole data pipeline for ActivityNet preparation, you will get the features, videos, frames and annotation files.

In the context of the whole project (for ActivityNet only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── ActivityNet

(if Option 1 used)
│   │   ├── anet_anno_{train,val,test,full}.json
│   │   ├── anet_anno_action.json
│   │   ├── video_info_new.csv
│   │   ├── activitynet_feature_cuhk
│   │   │   ├── csv_mean_100
│   │   │   │   ├── v___c8enCfzqw.csv
│   │   │   │   ├── v___dXUJsj3yo.csv
│   │   │   |   ├── ..

(if Option 2 used)
│   │   ├── anet_train_video.txt
│   │   ├── anet_val_video.txt
│   │   ├── anet_train_clip.txt
│   │   ├── anet_val_clip.txt
│   │   ├── activity_net.v1-3.min.json
│   │   ├── mmaction_feat
│   │   │   ├── v___c8enCfzqw.csv
│   │   │   ├── v___dXUJsj3yo.csv
│   │   │   ├── ..
│   │   ├── rawframes
│   │   │   ├── v___c8enCfzqw
│   │   │   │   ├── img_00000.jpg
│   │   │   │   ├── flow_x_00000.jpg
│   │   │   │   ├── flow_y_00000.jpg
│   │   │   │   ├── ..
│   │   │   ├── ..

For training and evaluating on ActivityNet, please refer to getting_started.md.

AVA

Introduction

@inproceedings{gu2018ava,
  title={Ava: A video dataset of spatio-temporally localized atomic visual actions},
  author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={6047--6056},
  year={2018}
}

For basic dataset information, please refer to the official website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/ava/.

Step 1. Prepare Annotations

First of all, you can run the following script to prepare annotations.

bash download_annotations.sh

This command will download ava_v2.1.zip for AVA v2.1 annotation. If you need the AVA v2.2 annotation, you can try the following script.

VERSION=2.2 bash download_annotations.sh

Step 2. Prepare Videos

Then, use the following script to prepare videos. The codes are adapted from the official crawler. Note that this might take a long time.

bash download_videos.sh

Or you can use the following command to downloading AVA videos in parallel using a python script.

bash download_videos_parallel.sh

Note that if you happen to have sudoer or have GNU parallel on your machine, you can speed up the procedure by downloading in parallel.

## sudo apt-get install parallel
bash download_videos_gnu_parallel.sh

Step 3. Cut Videos

Cut each video from its 15th to 30th minute and make them at 30 fps.

bash cut_videos.sh

Step 4. Extract RGB and Flow

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. And you can run the following script to soft link the extracted frames.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/ava_extracted/
ln -s /mnt/SSD/ava_extracted/ ../data/ava/rawframes/

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

bash extract_rgb_frames.sh

If you didn’t install denseflow, you can still extract RGB frames using ffmpeg by the following script.

bash extract_rgb_frames_ffmpeg.sh

If both are required, run the following script to extract frames.

bash extract_frames.sh

Step 5. Fetch Proposal Files

The scripts are adapted from FAIR’s Long-Term Feature Banks.

Run the following scripts to fetch the pre-computed proposal list.

bash fetch_ava_proposals.sh

Step 6. Folder Structure

After the whole data pipeline for AVA preparation. you can get the rawframes (RGB + Flow), videos and annotation files for AVA.

In the context of the whole project (for AVA only), the minimal folder structure will look like: (minimal means that some data are not necessary: for example, you may want to evaluate AVA using the original video format.)

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── ava
│   │   ├── annotations
│   │   |   ├── ava_dense_proposals_train.FAIR.recall_93.9.pkl
│   │   |   ├── ava_dense_proposals_val.FAIR.recall_93.9.pkl
│   │   |   ├── ava_dense_proposals_test.FAIR.recall_93.9.pkl
│   │   |   ├── ava_train_v2.1.csv
│   │   |   ├── ava_val_v2.1.csv
│   │   |   ├── ava_train_excluded_timestamps_v2.1.csv
│   │   |   ├── ava_val_excluded_timestamps_v2.1.csv
│   │   |   ├── ava_action_list_v2.1_for_activitynet_2018.pbtxt
│   │   ├── videos
│   │   │   ├── 053oq2xB3oU.mkv
│   │   │   ├── 0f39OWEqJ24.mp4
│   │   │   ├── ...
│   │   ├── videos_15min
│   │   │   ├── 053oq2xB3oU.mkv
│   │   │   ├── 0f39OWEqJ24.mp4
│   │   │   ├── ...
│   │   ├── rawframes
│   │   │   ├── 053oq2xB3oU
|   │   │   │   ├── img_00001.jpg
|   │   │   │   ├── img_00002.jpg
|   │   │   │   ├── ...

For training and evaluating on AVA, please refer to getting_started.

Reference

  1. O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014

Diving48

Introduction

@inproceedings{li2018resound,
  title={Resound: Towards action recognition without representation bias},
  author={Li, Yingwei and Li, Yi and Vasconcelos, Nuno},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={513--528},
  year={2018}
}

For basic dataset information, you can refer to the official dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/diving48/.

Step 1. Prepare Annotations

You can run the following script to download annotations (considering the correctness of annotation files, we only download V2 version here).

bash download_annotations.sh

Step 2. Prepare Videos

You can run the following script to download videos.

bash download_videos.sh

Step 3. Prepare RGB and Flow

This part is optional if you only want to use the video loader.

The frames provided in official compressed file are not complete. You may need to go through the following extraction steps to get the complete frames.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance.

You can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/diving48_extracted/
ln -s /mnt/SSD/diving48_extracted/ ../../../data/diving48/rawframes

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

cd $MMACTION2/tools/data/diving48/
bash extract_rgb_frames.sh

If you didn’t install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images.

cd $MMACTION2/tools/data/diving48/
bash extract_rgb_frames_opencv.sh

If both are required, run the following script to extract frames.

cd $MMACTION2/tools/data/diving48/
bash extract_frames.sh

Step 4. Generate File List

you can run the follow script to generate file list in the format of rawframes and videos.

bash generate_videos_filelist.sh
bash generate_rawframes_filelist.sh

Step 5. Check Directory Structure

After the whole data process for Diving48 preparation, you will get the rawframes (RGB + Flow), videos and annotation files for Diving48.

In the context of the whole project (for Diving48 only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── diving48
│   │   ├── diving48_{train,val}_list_rawframes.txt
│   │   ├── diving48_{train,val}_list_videos.txt
│   │   ├── annotations
│   |   |   ├── Diving48_V2_train.json
│   |   |   ├── Diving48_V2_test.json
│   |   |   ├── Diving48_vocab.json
│   |   ├── videos
│   |   |   ├── _8Vy3dlHg2w_00000.mp4
│   |   |   ├── _8Vy3dlHg2w_00001.mp4
│   |   |   ├── ...
│   |   ├── rawframes
│   |   |   ├── 2x00lRzlTVQ_00000
│   |   |   |   ├── img_00001.jpg
│   |   |   |   ├── img_00002.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── flow_x_00001.jpg
│   |   |   |   ├── flow_x_00002.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── flow_y_00001.jpg
│   |   |   |   ├── flow_y_00002.jpg
│   |   |   |   ├── ...
│   |   |   ├── 2x00lRzlTVQ_00001
│   |   |   ├── ...

For training and evaluating on Diving48, please refer to getting_started.md.

GYM

Introduction

@inproceedings{shao2020finegym,
  title={Finegym: A hierarchical video dataset for fine-grained action understanding},
  author={Shao, Dian and Zhao, Yue and Dai, Bo and Lin, Dahua},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={2616--2625},
  year={2020}
}

For basic dataset information, please refer to the official project and the paper. We currently provide the data pre-processing pipeline for GYM99. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/gym/.

Step 1. Prepare Annotations

First of all, you can run the following script to prepare annotations.

bash download_annotations.sh

Step 2. Prepare Videos

Then, you can run the following script to prepare videos. The codes are adapted from the official crawler. Note that this might take a long time.

bash download_videos.sh

Step 3. Trim Videos into Events

First, you need to trim long videos into events based on the annotation of GYM with the following scripts.

python trim_event.py

Step 4. Trim Events into Subactions

Then, you need to trim events into subactions based on the annotation of GYM with the following scripts. We use the two stage trimming for better efficiency (trimming multiple short clips from a long video can be extremely inefficient, since you need to go over the video many times).

python trim_subaction.py

Step 5. Extract RGB and Flow

This part is optional if you only want to use the video loader for RGB model training.

Before extracting, please refer to install.md for installing denseflow.

Run the following script to extract both rgb and flow using “tvl1” algorithm.

bash extract_frames.sh

Step 6. Generate file list for GYM99 based on extracted subactions

You can use the following script to generate train / val lists for GYM99.

python generate_file_list.py

Step 7. Folder Structure

After the whole data pipeline for GYM preparation. You can get the subaction clips, event clips, raw videos and GYM99 train/val lists.

In the context of the whole project (for GYM only), the full folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── gym
|   |   ├── annotations
|   |   |   ├── gym99_train_org.txt
|   |   |   ├── gym99_val_org.txt
|   |   |   ├── gym99_train.txt
|   |   |   ├── gym99_val.txt
|   |   |   ├── annotation.json
|   |   |   └── event_annotation.json
│   │   ├── videos
|   |   |   ├── 0LtLS9wROrk.mp4
|   |   |   ├── ...
|   |   |   └── zfqS-wCJSsw.mp4
│   │   ├── events
|   |   |   ├── 0LtLS9wROrk_E_002407_002435.mp4
|   |   |   ├── ...
|   |   |   └── zfqS-wCJSsw_E_006732_006824.mp4
│   │   ├── subactions
|   |   |   ├── 0LtLS9wROrk_E_002407_002435_A_0003_0005.mp4
|   |   |   ├── ...
|   |   |   └── zfqS-wCJSsw_E_006244_006252_A_0000_0007.mp4
|   |   └── subaction_frames

For training and evaluating on GYM, please refer to getting_started.

HMDB51

Introduction

@article{Kuehne2011HMDBAL,
  title={HMDB: A large video database for human motion recognition},
  author={Hilde Kuehne and Hueihan Jhuang and E. Garrote and T. Poggio and Thomas Serre},
  journal={2011 International Conference on Computer Vision},
  year={2011},
  pages={2556-2563}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/hmdb51/.

To run the bash scripts below, you need to install unrar. you can install it by sudo apt-get install unrar, or refer to this repo by following the usage and taking zzunrar.sh script for easy installation without sudo.

Step 1. Prepare Annotations

First of all, you can run the following script to prepare annotations.

bash download_annotations.sh

Step 2. Prepare Videos

Then, you can run the following script to prepare videos.

bash download_videos.sh

Step 3. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance.

You can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/hmdb51_extracted/
ln -s /mnt/SSD/hmdb51_extracted/ ../../../data/hmdb51/rawframes

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

bash extract_rgb_frames.sh

If you didn’t install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images.

bash extract_rgb_frames_opencv.sh

If both are required, run the following script to extract frames using “tvl1” algorithm.

bash extract_frames.sh

Step 4. Generate File List

you can run the follow script to generate file list in the format of rawframes and videos.

bash generate_rawframes_filelist.sh
bash generate_videos_filelist.sh

Step 5. Check Directory Structure

After the whole data process for HMDB51 preparation, you will get the rawframes (RGB + Flow), videos and annotation files for HMDB51.

In the context of the whole project (for HMDB51 only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── hmdb51
│   │   ├── hmdb51_{train,val}_split_{1,2,3}_rawframes.txt
│   │   ├── hmdb51_{train,val}_split_{1,2,3}_videos.txt
│   │   ├── annotations
│   │   ├── videos
│   │   │   ├── brush_hair
│   │   │   │   ├── April_09_brush_hair_u_nm_np1_ba_goo_0.avi

│   │   │   ├── wave
│   │   │   │   ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0.avi
│   │   ├── rawframes
│   │   │   ├── brush_hair
│   │   │   │   ├── April_09_brush_hair_u_nm_np1_ba_goo_0
│   │   │   │   │   ├── img_00001.jpg
│   │   │   │   │   ├── img_00002.jpg
│   │   │   │   │   ├── ...
│   │   │   │   │   ├── flow_x_00001.jpg
│   │   │   │   │   ├── flow_x_00002.jpg
│   │   │   │   │   ├── ...
│   │   │   │   │   ├── flow_y_00001.jpg
│   │   │   │   │   ├── flow_y_00002.jpg
│   │   │   ├── ...
│   │   │   ├── wave
│   │   │   │   ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0
│   │   │   │   ├── ...
│   │   │   │   ├── winKen_wave_u_cm_np1_ri_bad_1

For training and evaluating on HMDB51, please refer to getting_started.md.

HVU

Introduction

@article{Diba2019LargeSH,
  title={Large Scale Holistic Video Understanding},
  author={Ali Diba and M. Fayyaz and Vivek Sharma and Manohar Paluri and Jurgen Gall and R. Stiefelhagen and L. Gool},
  journal={arXiv: Computer Vision and Pattern Recognition},
  year={2019}
}

For basic dataset information, please refer to the official project and the paper. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/hvu/.

Step 1. Prepare Annotations

First of all, you can run the following script to prepare annotations.

bash download_annotations.sh

Besides, you need to run the following command to parse the tag list of HVU.

python parse_tag_list.py

Step 2. Prepare Videos

Then, you can run the following script to prepare videos. The codes are adapted from the official crawler. Note that this might take a long time.

bash download_videos.sh

Step 3. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

You can use the following script to extract both RGB and Flow frames.

bash extract_frames.sh

By default, we generate frames with short edge resized to 256. More details can be found in data_preparation

Step 4. Generate File List

You can run the follow scripts to generate file list in the format of videos and rawframes, respectively.

bash generate_videos_filelist.sh
## execute the command below when rawframes are ready
bash generate_rawframes_filelist.sh

Step 5. Generate File List for Each Individual Tag Categories

This part is optional if you don’t want to train models on HVU for a specific tag category.

The file list generated in step 4 contains labels of different categories. These file lists can only be handled with HVUDataset and used for multi-task learning of different tag categories. The component LoadHVULabel is needed to load the multi-category tags, and the HVULoss should be used to train the model.

If you only want to train video recognition models for a specific tag category, i.e. you want to train a recognition model on HVU which only handles tags in the category action, we recommend you to use the following command to generate file lists for the specific tag category. The new list, which only contains tags of a specific category, can be handled with VideoDataset or RawframeDataset. The recognition models can be trained with BCELossWithLogits.

The following command generates file list for the tag category ${category}, note that the tag category you specified should be in the 6 tag categories available in HVU: [‘action’, ‘attribute’, ‘concept’, ‘event’, ‘object’, ‘scene’].

python generate_sub_file_list.py path/to/filelist.json ${category}

The filename of the generated file list for ${category} is generated by replacing hvu in the original filename with hvu_${category}. For example, if the original filename is hvu_train.json, the filename of the file list for action is hvu_action_train.json.

Step 6. Folder Structure

After the whole data pipeline for HVU preparation. you can get the rawframes (RGB + Flow), videos and annotation files for HVU.

In the context of the whole project (for HVU only), the full folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── hvu
│   │   ├── hvu_train_video.json
│   │   ├── hvu_val_video.json
│   │   ├── hvu_train.json
│   │   ├── hvu_val.json
│   │   ├── annotations
│   │   ├── videos_train
│   │   │   ├── OLpWTpTC4P8_000570_000670.mp4
│   │   │   ├── xsPKW4tZZBc_002330_002430.mp4
│   │   │   ├── ...
│   │   ├── videos_val
│   │   ├── rawframes_train
│   │   ├── rawframes_val

For training and evaluating on HVU, please refer to getting_started.

Jester

Introduction

@InProceedings{Materzynska_2019_ICCV,
  author = {Materzynska, Joanna and Berger, Guillaume and Bax, Ingo and Memisevic, Roland},
  title = {The Jester Dataset: A Large-Scale Video Dataset of Human Gestures},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
  month = {Oct},
  year = {2019}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/jester/.

Step 1. Prepare Annotations

First of all, you have to sign in and download annotations to $MMACTION2/data/jester/annotations on the official website.

Step 2. Prepare RGB Frames

Since the jester website doesn’t provide the original video data and only extracted RGB frames are available, you have to directly download RGB frames from jester website.

You can download all RGB frame parts on jester website to $MMACTION2/data/jester/ and use the following command to extract.

cd $MMACTION2/data/jester/
cat 20bn-jester-v1-?? | tar zx
cd $MMACTION2/tools/data/jester/

For users who only want to use RGB frames, you can skip to step 5 to generate file lists in the format of rawframes. Since the prefix of official JPGs is “%05d.jpg” (e.g., “00001.jpg”), we add "filename_tmpl='{:05}.jpg'" to the dict of data.train, data.val and data.test in the config files related with jester like this:

data = dict(
    videos_per_gpu=16,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        filename_tmpl='{:05}.jpg',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        filename_tmpl='{:05}.jpg',
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        filename_tmpl='{:05}.jpg',
        pipeline=test_pipeline))

Step 3. Extract Flow

This part is optional if you only want to use RGB frames.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance.

You can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/jester_extracted/
ln -s /mnt/SSD/jester_extracted/ ../../../data/jester/rawframes

Then, you can run the following script to extract optical flow based on RGB frames.

cd $MMACTION2/tools/data/jester/
bash extract_flow.sh

Step 4. Encode Videos

This part is optional if you only want to use RGB frames.

You can run the following script to encode videos.

cd $MMACTION2/tools/data/jester/
bash encode_videos.sh

Step 5. Generate File List

You can run the follow script to generate file list in the format of rawframes and videos.

cd $MMACTION2/tools/data/jester/
bash generate_{rawframes, videos}_filelist.sh

Step 5. Check Directory Structure

After the whole data process for Jester preparation, you will get the rawframes (RGB + Flow), and annotation files for Jester.

In the context of the whole project (for Jester only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── jester
│   │   ├── jester_{train,val}_list_rawframes.txt
│   │   ├── jester_{train,val}_list_videos.txt
│   │   ├── annotations
│   |   ├── videos
│   |   |   ├── 1.mp4
│   |   |   ├── 2.mp4
│   |   |   ├──...
│   |   ├── rawframes
│   |   |   ├── 1
│   |   |   |   ├── 00001.jpg
│   |   |   |   ├── 00002.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── flow_x_00001.jpg
│   |   |   |   ├── flow_x_00002.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── flow_y_00001.jpg
│   |   |   |   ├── flow_y_00002.jpg
│   |   |   |   ├── ...
│   |   |   ├── 2
│   |   |   ├── ...

For training and evaluating on Jester, please refer to getting_started.md.

JHMDB

Introduction

@inproceedings{Jhuang:ICCV:2013,
    title = {Towards understanding action recognition},
    author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
    booktitle = {International Conf. on Computer Vision (ICCV)},
    month = Dec,
    pages = {3192-3199},
    year = {2013}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/jhmdb/.

Download and Extract

You can download the RGB frames, optical flow and ground truth annotations from google drive. The data are provided from MOC, which is adapted from act-detector.

After downloading the JHMDB.tar.gz file and put it in $MMACTION2/tools/data/jhmdb/, you can run the following command to extract.

tar -zxvf JHMDB.tar.gz

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance.

You can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/JHMDB/
ln -s /mnt/SSD/JHMDB/ ../../../data/jhmdb

Check Directory Structure

After extracting, you will get the FlowBrox04 directory, Frames directory and JHMDB-GT.pkl for JHMDB.

In the context of the whole project (for JHMDB only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── jhmdb
│   |   ├── FlowBrox04
│   |   |   ├── brush_hair
│   |   |   |   ├── April_09_brush_hair_u_nm_np1_ba_goo_0
│   |   |   |   |   ├── 00001.jpg
│   |   |   |   |   ├── 00002.jpg
│   |   |   |   |   ├── ...
│   |   |   |   |   ├── 00039.jpg
│   |   |   |   |   ├── 00040.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── Trannydude___Brushing_SyntheticHair___OhNOES!__those_fukin_knots!_brush_hair_u_nm_np1_fr_goo_2
│   |   |   ├── ...
│   |   |   ├── wave
│   |   |   |   ├── 21_wave_u_nm_np1_fr_goo_5
│   |   |   |   ├── ...
│   |   |   |   ├── Wie_man_winkt!!_wave_u_cm_np1_fr_med_0
│   |   ├── Frames
│   |   |   ├── brush_hair
│   |   |   |   ├── April_09_brush_hair_u_nm_np1_ba_goo_0
│   |   |   |   |   ├── 00001.png
│   |   |   |   |   ├── 00002.png
│   |   |   |   |   ├── ...
│   |   |   |   |   ├── 00039.png
│   |   |   |   |   ├── 00040.png
│   |   |   |   ├── ...
│   |   |   |   ├── Trannydude___Brushing_SyntheticHair___OhNOES!__those_fukin_knots!_brush_hair_u_nm_np1_fr_goo_2
│   |   |   ├── ...
│   |   |   ├── wave
│   |   |   |   ├── 21_wave_u_nm_np1_fr_goo_5
│   |   |   |   ├── ...
│   |   |   |   ├── Wie_man_winkt!!_wave_u_cm_np1_fr_med_0
│   |   ├── JHMDB-GT.pkl

Note

The JHMDB-GT.pkl exists as a cache, it contains 6 items as follows:

  1. labels (list): List of the 21 labels.

  2. gttubes (dict): Dictionary that contains the ground truth tubes for each video. A gttube is dictionary that associates with each index of label and a list of tubes. A tube is a numpy array with nframes rows and 5 columns, each col is in format like <frame index> <x1> <y1> <x2> <y2>.

  3. nframes (dict): Dictionary that contains the number of frames for each video, like 'walk/Panic_in_the_Streets_walk_u_cm_np1_ba_med_5': 16.

  4. train_videos (list): A list with nsplits=1 elements, each one containing the list of training videos.

  5. test_videos (list): A list with nsplits=1 elements, each one containing the list of testing videos.

  6. resolution (dict): Dictionary that outputs a tuple (h,w) of the resolution for each video, like 'pour/Bartender_School_Students_Practice_pour_u_cm_np1_fr_med_1': (240, 320).

Kinetics-[400/600/700]

Introduction

@inproceedings{inproceedings,
  author = {Carreira, J. and Zisserman, Andrew},
  year = {2017},
  month = {07},
  pages = {4724-4733},
  title = {Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset},
  doi = {10.1109/CVPR.2017.502}
}

For basic dataset information, please refer to the official website. The scripts can be used for preparing kinetics400, kinetics600, kinetics700. To prepare different version of kinetics, you need to replace ${DATASET} in the following examples with the specific dataset name. The choices of dataset names are kinetics400, kinetics600 and kinetics700. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/${DATASET}/.

Note

Because of the expirations of some YouTube links, the sizes of kinetics dataset copies may be different. Here are the sizes of our kinetics dataset copies that used to train all checkpoints.

Dataset training videos validation videos
kinetics400 240436 19796

Step 1. Prepare Annotations

First of all, you can run the following script to prepare annotations by downloading from the official website.

bash download_annotations.sh ${DATASET}

Since some video urls are invalid, the number of video items in current official annotations are less than the original official ones. So we provide an alternative way to download the older one as a reference. Among these, the annotation files of Kinetics400 and Kinetics600 are from official crawler, the annotation files of Kinetics700 are from website downloaded in 05/02/2021.

bash download_backup_annotations.sh ${DATASET}

Step 2. Prepare Videos

Then, you can run the following script to prepare videos. The codes are adapted from the official crawler. Note that this might take a long time.

bash download_videos.sh ${DATASET}

Important: If you have already downloaded video dataset using the download script above, you must replace all whitespaces in the class name for ease of processing by running

bash rename_classnames.sh ${DATASET}

For better decoding speed, you can resize the original videos into smaller sized, densely encoded version by:

python ../resize_videos.py ../../../data/${DATASET}/videos_train/ ../../../data/${DATASET}/videos_train_256p_dense_cache --dense --level 2

You can also download from Academic Torrents (kinetics400 & kinetics700 with short edge 256 pixels are avaiable) and cvdfoundation/kinetics-dataset (Host by Common Visual Data Foundation and Kinetics400/Kinetics600/Kinetics-700-2020 are available)

Step 3. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. And you can run the following script to soft link the extracted frames.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/${DATASET}_extracted_train/
ln -s /mnt/SSD/${DATASET}_extracted_train/ ../../../data/${DATASET}/rawframes_train/
mkdir /mnt/SSD/${DATASET}_extracted_val/
ln -s /mnt/SSD/${DATASET}_extracted_val/ ../../../data/${DATASET}/rawframes_val/

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

bash extract_rgb_frames.sh ${DATASET}

If you didn’t install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images.

bash extract_rgb_frames_opencv.sh ${DATASET}

If both are required, run the following script to extract frames.

bash extract_frames.sh ${DATASET}

The commands above can generate images with new short edge 256. If you want to generate images with short edge 320 (320p), or with fix size 340x256, you can change the args --new-short 256 to --new-short 320 or --new-width 340 --new-height 256. More details can be found in data_preparation

Step 4. Generate File List

you can run the follow scripts to generate file list in the format of videos and rawframes, respectively.

bash generate_videos_filelist.sh ${DATASET}
## execute the command below when rawframes are ready
bash generate_rawframes_filelist.sh ${DATASET}

Step 5. Folder Structure

After the whole data pipeline for Kinetics preparation. you can get the rawframes (RGB + Flow), videos and annotation files for Kinetics.

In the context of the whole project (for Kinetics only), the minimal folder structure will look like: (minimal means that some data are not necessary: for example, you may want to evaluate kinetics using the original video format.)

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── ${DATASET}
│   │   ├── ${DATASET}_train_list_videos.txt
│   │   ├── ${DATASET}_val_list_videos.txt
│   │   ├── annotations
│   │   ├── videos_train
│   │   ├── videos_val
│   │   │   ├── abseiling
│   │   │   │   ├── 0wR5jVB-WPk_000417_000427.mp4
│   │   │   │   ├── ...
│   │   │   ├── ...
│   │   │   ├── wrapping_present
│   │   │   ├── ...
│   │   │   ├── zumba
│   │   ├── rawframes_train
│   │   ├── rawframes_val

For training and evaluating on Kinetics, please refer to getting_started.

Moments in Time

Introduction

@article{monfortmoments,
    title={Moments in Time Dataset: one million videos for event understanding},
    author={Monfort, Mathew and Andonian, Alex and Zhou, Bolei and Ramakrishnan, Kandan and Bargal, Sarah Adel and Yan, Tom and Brown, Lisa and Fan, Quanfu and Gutfruend, Dan and Vondrick, Carl and others},
    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
    year={2019},
    issn={0162-8828},
    pages={1--8},
    numpages={8},
    doi={10.1109/TPAMI.2019.2901464},
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/mit/.

Step 1. Prepare Annotations and Videos

First of all, you have to visit the official website, fill in an application form for downloading the dataset. Then you will get the download link. You can use bash preprocess_data.sh to prepare annotations and videos. However, the download command is missing in that script. Remember to download the dataset to the proper place follow the comment in this script.

For better decoding speed, you can resize the original videos into smaller sized, densely encoded version by:

python ../resize_videos.py ../../../data/mit/videos/ ../../../data/mit/videos_256p_dense_cache --dense --level 2

Step 2. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. And you can run the following script to soft link the extracted frames.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/mit_extracted/
ln -s /mnt/SSD/mit_extracted/ ../../../data/mit/rawframes

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

bash extract_rgb_frames.sh

If you didn’t install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images.

bash extract_rgb_frames_opencv.sh

If both are required, run the following script to extract frames.

bash extract_frames.sh

Step 4. Generate File List

you can run the follow script to generate file list in the format of rawframes and videos.

bash generate_{rawframes, videos}_filelist.sh

Step 5. Check Directory Structure

After the whole data process for Moments in Time preparation, you will get the rawframes (RGB + Flow), videos and annotation files for Moments in Time.

In the context of the whole project (for Moments in Time only), the folder structure will look like:

mmaction2
├── data
│   └── mit
│       ├── annotations
│       │   ├── license.txt
│       │   ├── moments_categories.txt
│       │   ├── README.txt
│       │   ├── trainingSet.csv
│       │   └── validationSet.csv
│       ├── mit_train_rawframe_anno.txt
│       ├── mit_train_video_anno.txt
│       ├── mit_val_rawframe_anno.txt
│       ├── mit_val_video_anno.txt
│       ├── rawframes
│       │   ├── training
│       │   │   ├── adult+female+singing
│       │   │   │   ├── 0P3XG_vf91c_35
│       │   │   │   │   ├── flow_x_00001.jpg
│       │   │   │   │   ├── flow_x_00002.jpg
│       │   │   │   │   ├── ...
│       │   │   │   │   ├── flow_y_00001.jpg
│       │   │   │   │   ├── flow_y_00002.jpg
│       │   │   │   │   ├── ...
│       │   │   │   │   ├── img_00001.jpg
│       │   │   │   │   └── img_00002.jpg
│       │   │   │   └── yt-zxQfALnTdfc_56
│       │   │   │   │   ├── ...
│       │   │   └── yawning
│       │   │       ├── _8zmP1e-EjU_2
│       │   │       │   ├── ...
│       │   └── validation
│       │   │       ├── ...
│       └── videos
│           ├── training
│           │   ├── adult+female+singing
│           │   │   ├── 0P3XG_vf91c_35.mp4
│           │   │   ├── ...
│           │   │   └── yt-zxQfALnTdfc_56.mp4
│           │   └── yawning
│           │       ├── ...
│           └── validation
│           │   ├── ...
└── mmaction
└── ...

For training and evaluating on Moments in Time, please refer to getting_started.md.

Multi-Moments in Time

Introduction

@misc{monfort2019multimoments,
    title={Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding},
    author={Mathew Monfort and Kandan Ramakrishnan and Alex Andonian and Barry A McNamara and Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva},
    year={2019},
    eprint={1911.00232},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/mmit/.

Step 1. Prepare Annotations and Videos

First of all, you have to visit the official website, fill in an application form for downloading the dataset. Then you will get the download link. You can use bash preprocess_data.sh to prepare annotations and videos. However, the download command is missing in that script. Remember to download the dataset to the proper place follow the comment in this script.

For better decoding speed, you can resize the original videos into smaller sized, densely encoded version by:

python ../resize_videos.py ../../../data/mmit/videos/ ../../../data/mmit/videos_256p_dense_cache --dense --level 2

Step 2. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

First, you can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/mmit_extracted/
ln -s /mnt/SSD/mmit_extracted/ ../../../data/mmit/rawframes

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

bash extract_rgb_frames.sh

If you didn’t install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images.

bash extract_rgb_frames_opencv.sh

If both are required, run the following script to extract frames using “tvl1” algorithm.

bash extract_frames.sh

Step 3. Generate File List

you can run the follow script to generate file list in the format of rawframes or videos.

bash generate_rawframes_filelist.sh
bash generate_videos_filelist.sh

Step 4. Check Directory Structure

After the whole data process for Multi-Moments in Time preparation, you will get the rawframes (RGB + Flow), videos and annotation files for Multi-Moments in Time.

In the context of the whole project (for Multi-Moments in Time only), the folder structure will look like:

mmaction2/
└── data
    └── mmit
        ├── annotations
        │   ├── moments_categories.txt
        │   ├── trainingSet.txt
        │   └── validationSet.txt
        ├── mmit_train_rawframes.txt
        ├── mmit_train_videos.txt
        ├── mmit_val_rawframes.txt
        ├── mmit_val_videos.txt
        ├── rawframes
        │   ├── 0-3-6-2-9-1-2-6-14603629126_5
        │   │   ├── flow_x_00001.jpg
        │   │   ├── flow_x_00002.jpg
        │   │   ├── ...
        │   │   ├── flow_y_00001.jpg
        │   │   ├── flow_y_00002.jpg
        │   │   ├── ...
        │   │   ├── img_00001.jpg
        │   │   └── img_00002.jpg
        │   │   ├── ...
        │   └── yt-zxQfALnTdfc_56
        │   │   ├── ...
        │   └── ...

        └── videos
            └── adult+female+singing
                ├── 0-3-6-2-9-1-2-6-14603629126_5.mp4
                └── yt-zxQfALnTdfc_56.mp4
            └── ...

For training and evaluating on Multi-Moments in Time, please refer to getting_started.md.

OmniSource

Introduction

@article{duan2020omni,
  title={Omni-sourced Webly-supervised Learning for Video Recognition},
  author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua},
  journal={arXiv preprint arXiv:2003.13042},
  year={2020}
}

We release a subset of the OmniSource web dataset used in the paper Omni-sourced Webly-supervised Learning for Video Recognition. Since all web dataset in OmniSource are built based on the Kinetics-400 taxonomy, we select those web data related to the 200 classes in Mini-Kinetics subset (which is proposed in Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification).

We provide data from all sources that are related to the 200 classes in Mini-Kinetics (including Kinetics trimmed clips, Kinetics untrimmed videos, images from Google and Instagram, video clips from Instagram). To obtain this dataset, please first fill in the request form. We will share the download link to you after your request is received. Since we release all data crawled from the web without any filtering, the dataset is large and it may take some time to download them. We describe the size of the datasets in the following table:

Dataset Name ##samples Size Teacher Model #samples after filtering #samples similar to k200_val
k200_train 76030 45.6G N/A N/A N/A
k200_val 4838 2.9G N/A N/A N/A
googleimage_200 3050880 265.5G TSN-R50-8seg 1188695 967
insimage_200 3654650 224.4G TSN-R50-8seg 879726 116
insvideo_200 732855 1487.6G SlowOnly-8x8-R50 330680 956
k200_raw_train 76027 963.5G SlowOnly-8x8-R50 N/A N/A

The file structure of our uploaded OmniSource dataset looks like:

OmniSource/
├── annotations
│   ├── googleimage_200
│   │   ├── googleimage_200.txt                       File list of all valid images crawled from Google.
│   │   ├── tsn_8seg_googleimage_200_duplicate.txt    Postive file list of images crawled from Google, which is similar to a validation example.
│   │   ├── tsn_8seg_googleimage_200.txt              Postive file list of images crawled from Google, filtered by the teacher model.
│   │   └── tsn_8seg_googleimage_200_wodup.txt        Postive file list of images crawled from Google, filtered by the teacher model, after de-duplication.
│   ├── insimage_200
│   │   ├── insimage_200.txt
│   │   ├── tsn_8seg_insimage_200_duplicate.txt
│   │   ├── tsn_8seg_insimage_200.txt
│   │   └── tsn_8seg_insimage_200_wodup.txt
│   ├── insvideo_200
│   │   ├── insvideo_200.txt
│   │   ├── slowonly_8x8_insvideo_200_duplicate.txt
│   │   ├── slowonly_8x8_insvideo_200.txt
│   │   └── slowonly_8x8_insvideo_200_wodup.txt
│   ├── k200_actions.txt                              The list of action names of the 200 classes in MiniKinetics.
│   ├── K400_to_MiniKinetics_classidx_mapping.json    The index mapping from Kinetics-400 to MiniKinetics.
│   ├── kinetics_200
│   │   ├── k200_train.txt
│   │   └── k200_val.txt
│   ├── kinetics_raw_200
│   │   └── slowonly_8x8_kinetics_raw_200.json        Kinetics Raw Clips filtered by the teacher model.
│   └── webimage_200
│       └── tsn_8seg_webimage_200_wodup.txt           The union of `tsn_8seg_googleimage_200_wodup.txt` and `tsn_8seg_insimage_200_wodup.txt`
├── googleimage_200                                   (10 volumes)
│   ├── vol_0.tar
│   ├── ...
│   └── vol_9.tar
├── insimage_200                                      (10 volumes)
│   ├── vol_0.tar
│   ├── ...
│   └── vol_9.tar
├── insvideo_200                                      (20 volumes)
│   ├── vol_00.tar
│   ├── ...
│   └── vol_19.tar
├── kinetics_200_train
│   └── kinetics_200_train.tar
├── kinetics_200_val
│   └── kinetics_200_val.tar
└── kinetics_raw_200_train                            (16 volumes)
    ├── vol_0.tar
    ├── ...
    └── vol_15.tar

Data Preparation

For data preparation, you need to first download those data. For kinetics_200 and 3 web datasets: googleimage_200, insimage_200 and insvideo_200, you just need to extract each volume and merge their contents.

For Kinetics raw videos, since loading long videos is very heavy, you need to first trim it into clips. Here we provide a script named trim_raw_video.py. It trims a long video into 10-second clips and remove the original raw video. You can use it to trim the Kinetics raw video.

The data should be placed in data/OmniSource/. When data preparation finished, the folder structure of data/OmniSource looks like (We omit the files not needed in training & testing for simplicity):

data/OmniSource/
├── annotations
│   ├── googleimage_200
│   │   └── tsn_8seg_googleimage_200_wodup.txt    Postive file list of images crawled from Google, filtered by the teacher model, after de-duplication.
│   ├── insimage_200
│   │   └── tsn_8seg_insimage_200_wodup.txt
│   ├── insvideo_200
│   │   └── slowonly_8x8_insvideo_200_wodup.txt
│   ├── kinetics_200
│   │   ├── k200_train.txt
│   │   └── k200_val.txt
│   ├── kinetics_raw_200
│   │   └── slowonly_8x8_kinetics_raw_200.json    Kinetics Raw Clips filtered by the teacher model.
│   └── webimage_200
│       └── tsn_8seg_webimage_200_wodup.txt       The union of `tsn_8seg_googleimage_200_wodup.txt` and `tsn_8seg_insimage_200_wodup.txt`
├── googleimage_200
│   ├── 000
|   │   ├── 00
|   │   │   ├── 000001.jpg
|   │   │   ├── ...
|   │   │   └── 000901.jpg
|   │   ├── ...
|   │   ├── 19
│   ├── ...
│   └── 199
├── insimage_200
│   ├── 000
|   │   ├── abseil
|   │   │   ├── 1J9tKWCNgV_0.jpg
|   │   │   ├── ...
|   │   │   └── 1J9tKWCNgV_0.jpg
|   │   ├── abseiling
│   ├── ...
│   └── 199
├── insvideo_200
│   ├── 000
|   │   ├── abseil
|   │   │   ├── B00arxogubl.mp4
|   │   │   ├── ...
|   │   │   └── BzYsP0HIvbt.mp4
|   │   ├── abseiling
│   ├── ...
│   └── 199
├── kinetics_200_train
│   ├── 0074cdXclLU.mp4
|   ├── ...
|   ├── zzzlyL61Fyo.mp4
├── kinetics_200_val
│   ├── 01fAWEHzudA.mp4
|   ├── ...
|   ├── zymA_6jZIz4.mp4
└── kinetics_raw_200_train
│   ├── pref_
│   |   ├── ___dTOdxzXY
|   │   │   ├── part_0.mp4
|   │   │   ├── ...
|   │   │   ├── part_6.mp4
│   |   ├── ...
│   |   └── _zygwGDE2EM
│   ├── ...
│   └── prefZ

Skeleton Dataset

@misc{duan2021revisiting,
      title={Revisiting Skeleton-based Action Recognition},
      author={Haodong Duan and Yue Zhao and Kai Chen and Dian Shao and Dahua Lin and Bo Dai},
      year={2021},
      eprint={2104.13586},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Introduction

We release the skeleton annotations used in Revisiting Skeleton-based Action Recognition. By default, we use Faster-RCNN with ResNet50 backbone for human detection and HRNet-w32 for single person pose estimation. For FineGYM, we use Ground-Truth bounding boxes for the athlete instead of detection bounding boxes. Currently, we release the skeleton annotations for FineGYM and NTURGB-D Xsub split. Other annotations will be soo released.

Prepare Annotations

Currently, we support FineGYM and NTURGB+D. For FineGYM, you can execute following scripts to prepare the annotations.

bash download_annotations.sh ${DATASET}

Due to Conditions of Use of the NTURGB+D dataset, we can not directly release the annotations used in our experiments. So that we provide a script to generate pose annotations for videos in NTURGB+D datasets, which generate a dictionary and save it as a single pickle file. You can create a list which contain all annotation dictionaries of corresponding videos and save them as a pickle file. Then you can get the ntu60_xsub_train.pkl, ntu60_xsub_val.pkl, ntu120_xsub_train.pkl, ntu120_xsub_val.pkl that we used in training.

To generate 2D pose annotations for a single video, first, you need to install mmdetection and mmpose from src code. After that, you need to replace the placeholder mmdet_root and mmpose_root in ntu_pose_extraction.py with your installation path. Then you can use following scripts for NTURGB+D video pose extraction:

python ntu_pose_extraction.py S001C001P001R001A001_rgb.avi S001C001P001R001A001.pkl

After you get pose annotations for all videos in a dataset split, like ntu60_xsub_val. You can gather them into a single list and save the list as ntu60_xsub_val.pkl. You can use those larger pickle files for training and testing.

The Format of PoseC3D Annotations

Here we briefly introduce the format of PoseC3D Annotations, we will take gym_train.pkl as an example: the content of gym_train.pkl is a list of length 20484, each item is a dictionary that is the skeleton annotation of one video. Each dictionary has following fields:

  • keypoint: The keypoint coordinates, which is a numpy array of the shape N (##person) x T (temporal length) x K (#keypoints, 17 in our case) x 2 (x, y coordinate).

  • keypoint_score: The keypoint confidence scores, which is a numpy array of the shape N (##person) x T (temporal length) x K (#keypoints, 17 in our case).

  • frame_dir: The corresponding video name.

  • label: The action category.

  • img_shape: The image shape of each frame.

  • original_shape: Same as above.

  • total_frames: The temporal length of the video.

Visualization

For skeleton data visualization, you need also to prepare the RGB videos. Please refer to visualize_heatmap_volume for detailed process. Here we provide some visualization examples from NTU-60 and FineGYM.

Pose Estimation Results


Keypoint Heatmap Volume Visualization


Limb Heatmap Volume Visualization


TODO:

  • [x] FineGYM

  • [x] NTU60_XSub

  • [x] NTU120_XSub

  • [x] NTU60_XView

  • [x] NTU120_XSet

  • [ ] Kinetics

Something-Something V1

Introduction

@misc{goyal2017something,
      title={The "something something" video database for learning and evaluating visual common sense},
      author={Raghav Goyal and Samira Ebrahimi Kahou and Vincent Michalski and Joanna Materzyńska and Susanne Westphal and Heuna Kim and Valentin Haenel and Ingo Fruend and Peter Yianilos and Moritz Mueller-Freitag and Florian Hoppe and Christian Thurau and Ingo Bax and Roland Memisevic},
      year={2017},
      eprint={1706.04261},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/sthv1/.

Step 1. Prepare Annotations

First of all, you have to sign in and download annotations to $MMACTION2/data/sthv1/annotations on the official website.

Step 2. Prepare RGB Frames

Since the sthv1 website doesn’t provide the original video data and only extracted RGB frames are available, you have to directly download RGB frames from sthv1 website.

You can download all compressed file parts on sthv1 website to $MMACTION2/data/sthv1/ and use the following command to uncompress.

cd $MMACTION2/data/sthv1/
cat 20bn-something-something-v1-?? | tar zx
cd $MMACTION2/tools/data/sthv1/

For users who only want to use RGB frames, you can skip to step 5 to generate file lists in the format of rawframes. Since the prefix of official JPGs is “%05d.jpg” (e.g., “00001.jpg”), users need to add "filename_tmpl='{:05}.jpg'" to the dict of data.train, data.val and data.test in the config files related with sthv1 like this:

data = dict(
    videos_per_gpu=16,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        filename_tmpl='{:05}.jpg',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        filename_tmpl='{:05}.jpg',
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        filename_tmpl='{:05}.jpg',
        pipeline=test_pipeline))

Step 3. Extract Flow

This part is optional if you only want to use RGB frames.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance.

You can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/sthv1_extracted/
ln -s /mnt/SSD/sthv1_extracted/ ../../../data/sthv1/rawframes

Then, you can run the following script to extract optical flow based on RGB frames.

cd $MMACTION2/tools/data/sthv1/
bash extract_flow.sh

Step 4. Encode Videos

This part is optional if you only want to use RGB frames.

You can run the following script to encode videos.

cd $MMACTION2/tools/data/sthv1/
bash encode_videos.sh

Step 5. Generate File List

You can run the follow script to generate file list in the format of rawframes and videos.

cd $MMACTION2/tools/data/sthv1/
bash generate_{rawframes, videos}_filelist.sh

Step 6. Check Directory Structure

After the whole data process for Something-Something V1 preparation, you will get the rawframes (RGB + Flow), and annotation files for Something-Something V1.

In the context of the whole project (for Something-Something V1 only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── sthv1
│   │   ├── sthv1_{train,val}_list_rawframes.txt
│   │   ├── sthv1_{train,val}_list_videos.txt
│   │   ├── annotations
│   |   ├── videos
│   |   |   ├── 1.mp4
│   |   |   ├── 2.mp4
│   |   |   ├──...
│   |   ├── rawframes
│   |   |   ├── 1
│   |   |   |   ├── 00001.jpg
│   |   |   |   ├── 00002.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── flow_x_00001.jpg
│   |   |   |   ├── flow_x_00002.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── flow_y_00001.jpg
│   |   |   |   ├── flow_y_00002.jpg
│   |   |   |   ├── ...
│   |   |   ├── 2
│   |   |   ├── ...

For training and evaluating on Something-Something V1, please refer to getting_started.md.

Something-Something V2

Introduction

@misc{goyal2017something,
      title={The "something something" video database for learning and evaluating visual common sense},
      author={Raghav Goyal and Samira Ebrahimi Kahou and Vincent Michalski and Joanna Materzyńska and Susanne Westphal and Heuna Kim and Valentin Haenel and Ingo Fruend and Peter Yianilos and Moritz Mueller-Freitag and Florian Hoppe and Christian Thurau and Ingo Bax and Roland Memisevic},
      year={2017},
      eprint={1706.04261},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/sthv2/.

Step 1. Prepare Annotations

First of all, you have to sign in and download annotations to $MMACTION2/data/sthv2/annotations on the official website.

Step 2. Prepare Videos

Then, you can download all data parts to $MMACTION2/data/sthv2/ and use the following command to uncompress.

cd $MMACTION2/data/sthv2/
cat 20bn-something-something-v2-?? | tar zx
cd $MMACTION2/tools/data/sthv2/

Step 3. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance.

You can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/sthv2_extracted/
ln -s /mnt/SSD/sthv2_extracted/ ../../../data/sthv2/rawframes

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

cd $MMACTION2/tools/data/sthv2/
bash extract_rgb_frames.sh

If you didn’t install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images.

cd $MMACTION2/tools/data/sthv2/
bash extract_rgb_frames_opencv.sh

If both are required, run the following script to extract frames.

cd $MMACTION2/tools/data/sthv2/
bash extract_frames.sh

Step 4. Generate File List

you can run the follow script to generate file list in the format of rawframes and videos.

cd $MMACTION2/tools/data/sthv2/
bash generate_{rawframes, videos}_filelist.sh

Step 5. Check Directory Structure

After the whole data process for Something-Something V2 preparation, you will get the rawframes (RGB + Flow), videos and annotation files for Something-Something V2.

In the context of the whole project (for Something-Something V2 only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── sthv2
│   │   ├── sthv2_{train,val}_list_rawframes.txt
│   │   ├── sthv2_{train,val}_list_videos.txt
│   │   ├── annotations
│   |   ├── videos
│   |   |   ├── 1.mp4
│   |   |   ├── 2.mp4
│   |   |   ├──...
│   |   ├── rawframes
│   |   |   ├── 1
│   |   |   |   ├── img_00001.jpg
│   |   |   |   ├── img_00002.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── flow_x_00001.jpg
│   |   |   |   ├── flow_x_00002.jpg
│   |   |   |   ├── ...
│   |   |   |   ├── flow_y_00001.jpg
│   |   |   |   ├── flow_y_00002.jpg
│   |   |   |   ├── ...
│   |   |   ├── 2
│   |   |   ├── ...

For training and evaluating on Something-Something V2, please refer to getting_started.md.

THUMOS’14

Introduction

@misc{THUMOS14,
    author = {Jiang, Y.-G. and Liu, J. and Roshan Zamir, A. and Toderici, G. and Laptev,
    I. and Shah, M. and Sukthankar, R.},
    title = {{THUMOS} Challenge: Action Recognition with a Large
    Number of Classes},
    howpublished = "\url{http://crcv.ucf.edu/THUMOS14/}",
    Year = {2014}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/thumos14/.

Step 1. Prepare Annotations

First of all, run the following script to prepare annotations.

cd $MMACTION2/tools/data/thumos14/
bash download_annotations.sh

Step 2. Prepare Videos

Then, you can run the following script to prepare videos.

cd $MMACTION2/tools/data/thumos14/
bash download_videos.sh

Step 3. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance.

You can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/thumos14_extracted/
ln -s /mnt/SSD/thumos14_extracted/ ../data/thumos14/rawframes/

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

cd $MMACTION2/tools/data/thumos14/
bash extract_rgb_frames.sh

If you didn’t install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images.

cd $MMACTION2/tools/data/thumos14/
bash extract_rgb_frames_opencv.sh

If both are required, run the following script to extract frames.

cd $MMACTION2/tools/data/thumos14/
bash extract_frames.sh tvl1

Step 4. Fetch File List

This part is optional if you do not use SSN model.

You can run the follow script to fetch pre-computed tag proposals.

cd $MMACTION2/tools/data/thumos14/
bash fetch_tag_proposals.sh

Step 5. Denormalize Proposal File

This part is optional if you do not use SSN model.

You can run the follow script to denormalize pre-computed tag proposals according to actual number of local rawframes.

cd $MMACTION2/tools/data/thumos14/
bash denormalize_proposal_file.sh

Step 6. Check Directory Structure

After the whole data process for THUMOS’14 preparation, you will get the rawframes (RGB + Flow), videos and annotation files for THUMOS’14.

In the context of the whole project (for THUMOS’14 only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── thumos14
│   │   ├── proposals
│   │   |   ├── thumos14_tag_val_normalized_proposal_list.txt
│   │   |   ├── thumos14_tag_test_normalized_proposal_list.txt
│   │   ├── annotations_val
│   │   ├── annotations_test
│   │   ├── videos
│   │   │   ├── val
│   │   │   |   ├── video_validation_0000001.mp4
│   │   │   |   ├── ...
│   │   |   ├── test
│   │   │   |   ├── video_test_0000001.mp4
│   │   │   |   ├── ...
│   │   ├── rawframes
│   │   │   ├── val
│   │   │   |   ├── video_validation_0000001
|   │   │   |   │   ├── img_00001.jpg
|   │   │   |   │   ├── img_00002.jpg
|   │   │   |   │   ├── ...
|   │   │   |   │   ├── flow_x_00001.jpg
|   │   │   |   │   ├── flow_x_00002.jpg
|   │   │   |   │   ├── ...
|   │   │   |   │   ├── flow_y_00001.jpg
|   │   │   |   │   ├── flow_y_00002.jpg
|   │   │   |   │   ├── ...
│   │   │   |   ├── ...
│   │   |   ├── test
│   │   │   |   ├── video_test_0000001

For training and evaluating on THUMOS’14, please refer to getting_started.md.

UCF-101

Introduction

@article{Soomro2012UCF101AD,
  title={UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild},
  author={K. Soomro and A. Zamir and M. Shah},
  journal={ArXiv},
  year={2012},
  volume={abs/1212.0402}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/ucf101/.

Step 1. Prepare Annotations

First of all, you can run the following script to prepare annotations.

bash download_annotations.sh

Step 2. Prepare Videos

Then, you can run the following script to prepare videos.

bash download_videos.sh

For better decoding speed, you can resize the original videos into smaller sized, densely encoded version by:

python ../resize_videos.py ../../../data/ucf101/videos/ ../../../data/ucf101/videos_256p_dense_cache --dense --level 2 --ext avi

Step 3. Extract RGB and Flow

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

If you have plenty of SSD space, then we recommend extracting frames there for better I/O performance. The extracted frames (RGB + Flow) will take up about 100GB.

You can run the following script to soft link SSD.

## execute these two line (Assume the SSD is mounted at "/mnt/SSD/")
mkdir /mnt/SSD/ucf101_extracted/
ln -s /mnt/SSD/ucf101_extracted/ ../../../data/ucf101/rawframes

If you only want to play with RGB frames (since extracting optical flow can be time-consuming), consider running the following script to extract RGB-only frames using denseflow.

bash extract_rgb_frames.sh

If you didn’t install denseflow, you can still extract RGB frames using OpenCV by the following script, but it will keep the original size of the images.

bash extract_rgb_frames_opencv.sh

If both are required, run the following script to extract frames using “tvl1” algorithm.

bash extract_frames.sh

Step 4. Generate File List

you can run the follow script to generate file list in the format of rawframes and videos.

bash generate_videos_filelist.sh
bash generate_rawframes_filelist.sh

Step 5. Check Directory Structure

After the whole data process for UCF-101 preparation, you will get the rawframes (RGB + Flow), videos and annotation files for UCF-101.

In the context of the whole project (for UCF-101 only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── ucf101
│   │   ├── ucf101_{train,val}_split_{1,2,3}_rawframes.txt
│   │   ├── ucf101_{train,val}_split_{1,2,3}_videos.txt
│   │   ├── annotations
│   │   ├── videos
│   │   │   ├── ApplyEyeMakeup
│   │   │   │   ├── v_ApplyEyeMakeup_g01_c01.avi

│   │   │   ├── YoYo
│   │   │   │   ├── v_YoYo_g25_c05.avi
│   │   ├── rawframes
│   │   │   ├── ApplyEyeMakeup
│   │   │   │   ├── v_ApplyEyeMakeup_g01_c01
│   │   │   │   │   ├── img_00001.jpg
│   │   │   │   │   ├── img_00002.jpg
│   │   │   │   │   ├── ...
│   │   │   │   │   ├── flow_x_00001.jpg
│   │   │   │   │   ├── flow_x_00002.jpg
│   │   │   │   │   ├── ...
│   │   │   │   │   ├── flow_y_00001.jpg
│   │   │   │   │   ├── flow_y_00002.jpg
│   │   │   ├── ...
│   │   │   ├── YoYo
│   │   │   │   ├── v_YoYo_g01_c01
│   │   │   │   ├── ...
│   │   │   │   ├── v_YoYo_g25_c05

For training and evaluating on UCF-101, please refer to getting_started.md.

UCF101-24

Introduction

@article{Soomro2012UCF101AD,
  title={UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild},
  author={K. Soomro and A. Zamir and M. Shah},
  journal={ArXiv},
  year={2012},
  volume={abs/1212.0402}
}

For basic dataset information, you can refer to the dataset website. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/ucf101_24/.

Download and Extract

You can download the RGB frames, optical flow and ground truth annotations from google drive. The data are provided from MOC, which is adapted from act-detector and corrected-UCF101-Annots.

Note

The annotation of this UCF101-24 is from here, which is more correct.

After downloading the UCF101_v2.tar.gz file and put it in $MMACTION2/tools/data/ucf101_24/, you can run the following command to uncompress.

tar -zxvf UCF101_v2.tar.gz

Check Directory Structure

After uncompressing, you will get the rgb-images directory, brox-images directory and UCF101v2-GT.pkl for UCF101-24.

In the context of the whole project (for UCF101-24 only), the folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── ucf101_24
│   |   ├── brox-images
│   |   |   ├── Basketball
│   |   |   |   ├── v_Basketball_g01_c01
│   |   |   |   |   ├── 00001.jpg
│   |   |   |   |   ├── 00002.jpg
│   |   |   |   |   ├── ...
│   |   |   |   |   ├── 00140.jpg
│   |   |   |   |   ├── 00141.jpg
│   |   |   ├── ...
│   |   |   ├── WalkingWithDog
│   |   |   |   ├── v_WalkingWithDog_g01_c01
│   |   |   |   ├── ...
│   |   |   |   ├── v_WalkingWithDog_g25_c04
│   |   ├── rgb-images
│   |   |   ├── Basketball
│   |   |   |   ├── v_Basketball_g01_c01
│   |   |   |   |   ├── 00001.jpg
│   |   |   |   |   ├── 00002.jpg
│   |   |   |   |   ├── ...
│   |   |   |   |   ├── 00140.jpg
│   |   |   |   |   ├── 00141.jpg
│   |   |   ├── ...
│   |   |   ├── WalkingWithDog
│   |   |   |   ├── v_WalkingWithDog_g01_c01
│   |   |   |   ├── ...
│   |   |   |   ├── v_WalkingWithDog_g25_c04
│   |   ├── UCF101v2-GT.pkl

Note

The UCF101v2-GT.pkl exists as a cache, it contains 6 items as follows:

  1. labels (list): List of the 24 labels.

  2. gttubes (dict): Dictionary that contains the ground truth tubes for each video. A gttube is dictionary that associates with each index of label and a list of tubes. A tube is a numpy array with nframes rows and 5 columns, each col is in format like <frame index> <x1> <y1> <x2> <y2>.

  3. nframes (dict): Dictionary that contains the number of frames for each video, like 'HorseRiding/v_HorseRiding_g05_c02': 151.

  4. train_videos (list): A list with nsplits=1 elements, each one containing the list of training videos.

  5. test_videos (list): A list with nsplits=1 elements, each one containing the list of testing videos.

  6. resolution (dict): Dictionary that outputs a tuple (h,w) of the resolution for each video, like 'FloorGymnastics/v_FloorGymnastics_g09_c03': (240, 320).

Read the Docs v: latest
Versions
latest
stable
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.