Preparing HVU¶

Introduction¶

@article{Diba2019LargeSH,
  title={Large Scale Holistic Video Understanding},
  author={Ali Diba and M. Fayyaz and Vivek Sharma and Manohar Paluri and Jurgen Gall and R. Stiefelhagen and L. Gool},
  journal={arXiv: Computer Vision and Pattern Recognition},
  year={2019}
}

For basic dataset information, please refer to the official project and the paper. Before we start, please make sure that the directory is located at $MMACTION2/tools/data/hvu/.

Step 1. Prepare Annotations¶

First of all, you can run the following script to prepare annotations.

bash download_annotations.sh

Besides, you need to run the following command to parse the tag list of HVU.

python parse_tag_list.py

Step 2. Prepare Videos¶

Then, you can run the following script to prepare videos. The codes are adapted from the official crawler. Note that this might take a long time.

bash download_videos.sh

Step 3. Extract RGB and Flow¶

This part is optional if you only want to use the video loader.

Before extracting, please refer to install.md for installing denseflow.

You can use the following script to extract both RGB and Flow frames.

bash extract_frames.sh

By default, we generate frames with short edge resized to 256. More details can be found in prepare_dataset

Step 4. Generate File List¶

You can run the follow scripts to generate file list in the format of videos and rawframes, respectively.

bash generate_videos_filelist.sh
# execute the command below when rawframes are ready
bash generate_rawframes_filelist.sh

Step 5. Generate File List for Each Individual Tag Categories¶

This part is optional if you don’t want to train models on HVU for a specific tag category.

The file list generated in step 4 contains labels of different categories. These file lists can only be handled with HVUDataset and used for multi-task learning of different tag categories. The component LoadHVULabel is needed to load the multi-category tags, and the HVULoss should be used to train the model.

If you only want to train video recognition models for a specific tag category, i.e. you want to train a recognition model on HVU which only handles tags in the category action, we recommend you to use the following command to generate file lists for the specific tag category. The new list, which only contains tags of a specific category, can be handled with VideoDataset or RawframeDataset. The recognition models can be trained with BCELossWithLogits.

The following command generates file list for the tag category ${category}, note that the tag category you specified should be in the 6 tag categories available in HVU: [‘action’, ‘attribute’, ‘concept’, ‘event’, ‘object’, ‘scene’].

python generate_sub_file_list.py path/to/filelist.json ${category}

The filename of the generated file list for ${category} is generated by replacing hvu in the original filename with hvu_${category}. For example, if the original filename is hvu_train.json, the filename of the file list for action is hvu_action_train.json.

Step 6. Folder Structure¶

After the whole data pipeline for HVU preparation. you can get the rawframes (RGB + Flow), videos and annotation files for HVU.

In the context of the whole project (for HVU only), the full folder structure will look like:

mmaction2
├── mmaction
├── tools
├── configs
├── data
│   ├── hvu
│   │   ├── hvu_train_video.json
│   │   ├── hvu_val_video.json
│   │   ├── hvu_train.json
│   │   ├── hvu_val.json
│   │   ├── annotations
│   │   ├── videos_train
│   │   │   ├── OLpWTpTC4P8_000570_000670.mp4
│   │   │   ├── xsPKW4tZZBc_002330_002430.mp4
│   │   │   ├── ...
│   │   ├── videos_val
│   │   ├── rawframes_train
│   │   ├── rawframes_val

For training and evaluating on HVU, please refer to Training and Test Tutorial.