Welcome to Projects of MMAction2¶

In this folder, we welcome all contributions of deep-learning video understanding models from the community.

Here, these requirements, e.g., code standards, are not as strict as in the core package. Thus, developers from the community can implement their algorithms much more easily and efficiently in MMAction2. We appreciate all contributions from the community to make MMAction2 greater.

Here is an example project about how to add your algorithms easily.

We also provide some documentation listed below:

Contribution Guide

The guides for new contributors about how to add your projects to MMAction2.
Discussions

Welcome to start a discussion!

Example Project¶

This is an example README for community projects/. You can write your README in your own project. Here are some recommended parts of a README for others to understand and use your project, you can copy or modify them according to your project.

Usage¶

Setup Environment¶

Please refer to Get Started to install MMAction2.

At first, add the current folder to PYTHONPATH, so that Python can find your code. Run command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation¶

Prepare the Kinetics400 dataset according to the instruction.

Training commands¶

To train with single GPU:

mim train mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py

To train with multiple GPUs:

mim train mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands¶

To test with single GPU:

mim test mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results¶

frame sampling strategy	resolution	gpus	backbone	pretrain	top1 acc	top5 acc	testing protocol	config	ckpt	log
1x1x3	224x224	8	ResNet50	ImageNet	72.83	90.65	25 clips x 10 crop	config	ckpt	log

Citation¶

@misc{2020mmaction2,
  title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
  author={MMAction2 Contributors},
  howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
  year={2020}
}

Checklist¶

Here is a checklist of this project’s progress, and you can ignore this part if you don’t plan to contribute to MMAction2 projects.

[ ] Milestone 1: PR-ready, and acceptable to be one of the projects/.
- [ ] Finish the code
- [ ] Basic docstrings & proper citation
- [ ] Converted checkpoint and results (Only for reproduction)
[ ] Milestone 2: Indicates a successful model implementation.
- [ ] Training results
[ ] Milestone 3: Good to be a part of our core package!
- [ ] Unit tests
- [ ] Code style
- [ ] metafile.yml and README.md

Knowledge Distillation Based on MMRazor¶

Knowledge Distillation is a classic model compression method. The core idea is to “imitate” a teacher model (or multi-model ensemble) with better performance and more complex structure by guiding a lightweight student model, improving the performance of the student model without changing its structure. MMRazor is a model compression toolkit for model slimming and AutoML, which supports several KD algorithms. In this project, we take TSM-MobileNetV2 as an example to show how to use MMRazor to perform knowledge distillation on action recognition models. You could refer to more MMRazor for more model compression algorithms.

Description¶

This is an implementation of MMRazor Knowledge Distillation Application, we provide action recognition configs and models for MMRazor.

Usage¶

Prerequisites¶

MMRazor v1.0.0 or higher

There are two install modes:

Option (a). Install as a Python package

mim install "mmrazor>=1.0.0"

Option (b). Install from source

git clone https://github.com/open-mmlab/mmrazor.git
cd mmrazor
pip install -v -e .

Setup Environment¶

Please refer to Get Started to install MMAction2.

At first, add the current folder to PYTHONPATH, so that Python can find your code. Run command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation¶

Prepare the Kinetics400 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Training commands¶

To train with single GPU:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py

To train with multiple GPUs:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands¶

Please convert the knowledge distillation checkpoint to student-only checkpoint with following commands, you will get a checkpoint with a ‘_student.pth’ suffix under the same directory as the original checkpoint. Then take the student-only checkpoint for testing.

mim run mmrazor convert_kd_ckpt_to_student $CHECKPOINT

To test with single GPU:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results and models¶

Location	Dataset	Teacher	Student	Acc	Acc(T)	Acc(S)	Config	Download
logits	Kinetics-400	[TSM-ResNet50]	[TSM-MobileNetV2]	69.60(+0.9)	73.22	68.71	[config][distill_tsm]	[teacher][tsm_r50_pth] \| [model][distill_pth_tsm] \| [log][distill_log_tsm]
logits	Kinetics-400	[TSN-Swin]	[TSN-ResNet50]	75.54(+1.4)	79.22	74.12	[config][distill_tsn]	[teacher][tsn_swin_pth] \| [model][distill_pth_tsn] \| [log][distill_log_tsn]

Citation¶

@article{huang2022knowledge,
  title={Knowledge Distillation from A Stronger Teacher},
  author={Huang, Tao and You, Shan and Wang, Fei and Qian, Chen and Xu, Chang},
  journal={arXiv preprint arXiv:2205.10536},
  year={2022}
}

UMT Project¶

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Abstract¶

Video Foundation Models (VFMs) have received limited exploration due to high computational costs and data scarcity. Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain. Although VideoMAE has trained a robust ViT from limited data, its low-level reconstruction poses convergence difficulties and conflicts with high-level cross-modal alignment. This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. To increase data efficiency, we mask out most of the low-semantics video tokens, but selectively align the unmasked tokens with IFM, which serves as the UnMasked Teacher (UMT). By providing semantic guidance, our method enables faster convergence and multimodal friendliness. With a progressive pre-training framework, our model can handle various tasks including scene-related, temporal-related, and complex video-language understanding. Using only public sources for pre-training in 6 days on 32 A100 GPUs, our scratch-built ViT-L/16 achieves state-of-the-art performances on various video tasks.

Usage¶

Setup Environment¶

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/umt.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation¶

Prepare the Kinetics dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Testing commands¶

To test with single GPU:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results¶

Kinetics400¶

frame sampling strategy	resolution	backbone	pretrain	top1 acc	testing protocol	config	ckpt
uniform 8	224x224	UMT-B	Kinetics710	87.33	4 clips x 3 crop	config	ckpt
uniform 8	224x224	UMT-L	Kinetics710	90.21	4 clips x 3 crop	config	ckpt

Kinetics700¶

frame sampling strategy	resolution	backbone	pretrain	top1 acc	testing protocol	config	ckpt
uniform 8	224x224	UMT-B	Kinetics710	77.95	4 clips x 3 crop	config	ckpt
uniform 8	224x224	UMT-L	Kinetics710	82.79	4 clips x 3 crop	config	ckpt

Citation¶

@article{li2023unmasked,
  title={Unmasked teacher: Towards training-efficient video foundation models},
  author={Li, Kunchang and Wang, Yali and Li, Yizhuo and Wang, Yi and He, Yinan and Wang, Limin and Qiao, Yu},
  journal={arXiv preprint arXiv:2303.16058},
  year={2023}
}

CTRGCN Project¶

Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition

Abstract¶

Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition. In GCNs, graph topology dominates feature aggregation and therefore is the key to extracting representative features. In this work, we propose a novel Channel-wise Topology Refinement Graph Convolution (CTR-GC) to dynamically learn different topologies and effectively aggregate joint features in different channels for skeleton-based action recognition. The proposed CTR-GC models channel-wise topologies through learning a shared topology as a generic prior for all channels and refining it with channel-specific correlations for each channel. Our refinement method introduces few extra parameters and significantly reduces the difficulty of modeling channel-wise topologies. Furthermore, via reformulating graph convolutions into a unified form, we find that CTR-GC relaxes strict constraints of graph convolutions, leading to stronger representation capability. Combining CTR-GC with temporal modeling modules, we develop a powerful graph convolutional network named CTR-GCN which notably outperforms state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

Usage¶

Setup Environment¶

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/ctrgcn.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation¶

Prepare the NTU60 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Training commands¶

To train with single GPU:

mim train mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py

To train with multiple GPUs:

mim train mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands¶

To test with single GPU:

mim test mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results¶

NTU60_XSub_2D¶

frame sampling strategy	modality	gpus	backbone	top1 acc	testing protocol	config	ckpt	log
uniform 100	joint	8	CTRGCN	89.6	10 clips	config	ckpt	log

NTU60_XSub_3D¶

frame sampling strategy	modality	gpus	backbone	top1 acc	testing protocol	config	ckpt	log
uniform 100	joint	8	CTRGCN	89.0	10 clips	config	ckpt	log

Citation¶

@inproceedings{chen2021channel,
  title={Channel-wise topology refinement graph convolution for skeleton-based action recognition},
  author={Chen, Yuxin and Zhang, Ziqi and Yuan, Chunfeng and Li, Bing and Deng, Ying and Hu, Weiming},
  booktitle={CVPR},
  pages={13359--13368},
  year={2021}
}

Gesture Recognition¶

Introduction¶

In this project, we present a skeleton based pipeline for gesture recognition. The pipeline is three-stage. The first stage consists of a hand detection module that outputs bounding boxes of human hands from video frames. Afterwards, the second stage employs a pose estimation module to generate keypoints of the detected hands. Finally, the third stage utilizes a skeleton-based gesture recognition module to classify hand actions based on the provided hand skeleton. The three-stage pipeline is lightweight and can achieve real-time on CPU devices. In this README, we provide the models and the inference demo for the project. Training data preparation and training scripts are described in TRAINING.md.

Hand detection stage¶

Hand detection results on OneHand10K validation dataset

Config	Input Size	bbox mAP	bbox mAP 50	bbox mAP 75	ckpt	log
rtmpose_nano	320x320	0.8100	0.9870	0.9190	ckpt	log

Pose estimation stage¶

Pose estimation results on COCO-WholeBody-Hand validation set

Config	Input Size	PCK@0.2	AUC	EPE	ckpt
rtmpose_m	256x256	0.815	0.837	4.51	ckpt

Gesture recognition stage¶

Skeleton base gesture recognition results on Jester validation

Config	Input Size	Top 1 accuracy	Top 5 accuracy	ckpt	log
STGCNPP	100x17x3	89.22	97.52	ckpt	log
# ActionCLIP Project

ActionCLIP: A New Paradigm for Video Action Recognition

Abstract¶

The canonical approach to video action recognition dictates a neural model to do a classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined categories, limiting their transferable ability on new datasets with unseen concepts. In this paper, we provide a new perspective on action recognition by attaching importance to the semantic information of label texts rather than simply mapping them into numbers. Specifically, we model this task as a video-text matching problem within a multimodal learning framework, which strengthens the video representation with more semantic language supervision and enables our model to do zero-shot action recognition without any further labeled data or parameters requirements. Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub “pre-train, prompt and fine-tune”. This paradigm first learns powerful representations from pre-training on a large amount of web image-text or video-text data. Then it makes the action recognition task to act more like pre-training problems via prompt engineering. Finally, it end-to-end fine-tunes on target datasets to obtain strong performance. We give an instantiation of the new paradigm, ActionCLIP, which not only has superior and flexible zero-shot/few-shot transfer ability but also reaches a top performance on general action recognition task, achieving 83.8% top-1 accuracy on Kinetics-400 with a ViT-B/16 as the backbone.

Usage¶

Setup Environment¶

Please refer to Installation to install MMAction2. Run the following command to install clip.

pip install git+https://github.com/openai/CLIP.git

Assume that you are located at $MMACTION2/projects/actionclip.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation¶

Prepare the Kinetics400 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Training commands¶

To train with single GPU:

mim train mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py

To train with multiple GPUs:

mim train mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands¶

To test with single GPU:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results¶

Kinetics400¶

frame sampling strategy	backbone	top1 acc	top5 acc	testing protocol	config	ckpt
1x1x8	ViT-B/32	77.6	93.8	8 clips x 1 crop	config	ckpt[1]
1x1x8	ViT-B/16	80.3	95.2	8 clips x 1 crop	config	ckpt[1]
1x1x16	ViT-B/16	81.1	95.6	16 clips x 1 crop	config	ckpt[1]
1x1x32	ViT-B/16	81.3	95.8	32 clips x 1 crop	config	ckpt[1]

[1] The models are ported from the repo ActionCLIP and tested on our data. Currently, we only support the testing of ActionCLIP models. Due to the variation in testing data, our reported test accuracy differs from that of the original repository (on average, it is lower by one point). Please refer to this issue for more details.

Kinetics400 (Trained on Our K400 dataset)¶

frame sampling strategy	gpus	backbone	top1 acc	top5 acc	testing protocol	config	ckpt	log
1x1x8	8	ViT-B/32	77.5	93.2	8 clips x 1 crop	config	ckpt	log
1x1x8	8	ViT-B/16	81.3	95.2	8 clips x 1 crop	config	ckpt	log

Zero-Shot Prediction¶

We offer two methods for zero-shot prediction as follows. The test.mp4 can be downloaded from here.

Using Naive Pytorch¶

import torch
import clip
from models.load import init_actionclip
from mmaction.utils import register_all_modules

register_all_modules(True)

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = init_actionclip('ViT-B/32-8', device=device)

video_anno = dict(filename='test.mp4', start_index=0)
video = preprocess(video_anno).unsqueeze(0).to(device)

template = 'The woman is {}'
labels = ['singing', 'dancing', 'performing']
text = clip.tokenize([template.format(label) for label in labels]).to(device)

with torch.no_grad():
    video_features = model.encode_video(video)
    text_features = model.encode_text(text)

video_features /= video_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
similarity = (100 * video_features @ text_features.T).softmax(dim=-1)
probs = similarity.cpu().numpy()

print("Label probs:", probs)  # [[9.995e-01 5.364e-07 6.666e-04]]

Using MMAction2 APIs¶

import mmengine
import torch
from mmaction.utils import register_all_modules
from mmaction.apis import inference_recognizer, init_recognizer

register_all_modules(True)

config_path = 'configs/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb.py'
checkpoint_path = 'https://download.openmmlab.com/mmaction/v1.0/projects/actionclip/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb/vit-b-32-8f.pth'
template = 'The woman is {}'
labels = ['singing', 'dancing', 'performing']

# Update the labels, the default is the label list of K400.
config = mmengine.Config.fromfile(config_path)
config.model.labels_or_label_file = labels
config.model.template = template

device = "cuda" if torch.cuda.is_available() else "cpu"
model = init_recognizer(config=config, checkpoint=checkpoint_path, device=device)

pred_result = inference_recognizer(model, 'test.mp4')
probs = pred_result.pred_score.cpu().numpy()
print("Label probs:", probs)  # [9.995e-01 5.364e-07 6.666e-04]

Citation¶

@article{wang2021actionclip,
  title={Actionclip: A new paradigm for video action recognition},
  author={Wang, Mengmeng and Xing, Jiazheng and Liu, Yong},
  journal={arXiv preprint arXiv:2109.08472},
  year={2021}
}

MSG3D Project¶

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Abstract¶

Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

Usage¶

Setup Environment¶

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/msg3d.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation¶

Prepare the NTU60 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Data Preparation¶

Prepare the NTU60 dataset according to the instruction.

Training commands¶

To train with single GPU:

mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py

To train with multiple GPUs:

mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands¶

To test with single GPU:

mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results¶

NTU60_XSub_2D¶

frame sampling strategy	modality	gpus	backbone	top1 acc	testing protocol	config	ckpt	log
uniform 100	joint	8	MSG3D	92.3	10 clips	config	ckpt	log

NTU60_XSub_3D¶

frame sampling strategy	modality	gpus	backbone	top1 acc	testing protocol	config	ckpt	log
uniform 100	joint	8	MSG3D	89.6	10 clips	config	ckpt	log

Citation¶

@inproceedings{liu2020disentangling,
  title={Disentangling and unifying graph convolutions for skeleton-based action recognition},
  author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli},
  booktitle={CVPR},
  pages={143--152},
  year={2020}
}