Shortcuts

Welcome to Projects of MMAction2

In this folder, we welcome all contributions of deep-learning video understanding models from the community.

Here, these requirements, e.g., code standards, are not as strict as in the core package. Thus, developers from the community can implement their algorithms much more easily and efficiently in MMAction2. We appreciate all contributions from the community to make MMAction2 greater.

Here is an example project about how to add your algorithms easily.

We also provide some documentation listed below:

Example Project

This is an example README for community projects/. You can write your README in your own project. Here are some recommended parts of a README for others to understand and use your project, you can copy or modify them according to your project.

Usage

Setup Environment

Please refer to Get Started to install MMAction2.

At first, add the current folder to PYTHONPATH, so that Python can find your code. Run command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the Kinetics400 dataset according to the instruction.

Training commands

To train with single GPU:

mim train mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py

To train with multiple GPUs:

mim train mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands

To test with single GPU:

mim test mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/examplenet_r50-in1k-pre_8xb32-1x1x3-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results

frame sampling strategy resolution gpus backbone pretrain top1 acc top5 acc testing protocol config ckpt log
1x1x3 224x224 8 ResNet50 ImageNet 72.83 90.65 25 clips x 10 crop config ckpt log

Citation

@misc{2020mmaction2,
  title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
  author={MMAction2 Contributors},
  howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
  year={2020}
}

Checklist

Here is a checklist of this project’s progress, and you can ignore this part if you don’t plan to contribute to MMAction2 projects.

  • [ ] Milestone 1: PR-ready, and acceptable to be one of the projects/.

    • [ ] Finish the code

    • [ ] Basic docstrings & proper citation

    • [ ] Converted checkpoint and results (Only for reproduction)

  • [ ] Milestone 2: Indicates a successful model implementation.

    • [ ] Training results

  • [ ] Milestone 3: Good to be a part of our core package!

    • [ ] Unit tests

    • [ ] Code style

    • [ ] metafile.yml and README.md

Knowledge Distillation Based on MMRazor

Knowledge Distillation is a classic model compression method. The core idea is to “imitate” a teacher model (or multi-model ensemble) with better performance and more complex structure by guiding a lightweight student model, improving the performance of the student model without changing its structure. MMRazor is a model compression toolkit for model slimming and AutoML, which supports several KD algorithms. In this project, we take TSM-MobileNetV2 as an example to show how to use MMRazor to perform knowledge distillation on action recognition models. You could refer to more MMRazor for more model compression algorithms.

Description

This is an implementation of MMRazor Knowledge Distillation Application, we provide action recognition configs and models for MMRazor.

Usage

Prerequisites

There are two install modes:

Option (a). Install as a Python package

mim install "mmrazor>=1.0.0"

Option (b). Install from source

git clone https://github.com/open-mmlab/mmrazor.git
cd mmrazor
pip install -v -e .

Setup Environment

Please refer to Get Started to install MMAction2.

At first, add the current folder to PYTHONPATH, so that Python can find your code. Run command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Data Preparation

Prepare the Kinetics400 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Training commands

To train with single GPU:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py

To train with multiple GPUs:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands

Please convert the knowledge distillation checkpoint to student-only checkpoint with following commands, you will get a checkpoint with a ‘_student.pth’ suffix under the same directory as the original checkpoint. Then take the student-only checkpoint for testing.

mim run mmrazor convert_kd_ckpt_to_student $CHECKPOINT

To test with single GPU:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results and models

Location Dataset Teacher Student Acc Acc(T) Acc(S) Config Download
logits Kinetics-400 [TSM-ResNet50] [TSM-MobileNetV2] 69.60(+0.9) 73.22 68.71 [config][distill_tsm] [teacher][tsm_r50_pth] | [model][distill_pth_tsm] | [log][distill_log_tsm]
logits Kinetics-400 [TSN-Swin] [TSN-ResNet50] 75.54(+1.4) 79.22 74.12 [config][distill_tsn] [teacher][tsn_swin_pth] | [model][distill_pth_tsn] | [log][distill_log_tsn]

Citation

@article{huang2022knowledge,
  title={Knowledge Distillation from A Stronger Teacher},
  author={Huang, Tao and You, Shan and Wang, Fei and Qian, Chen and Xu, Chang},
  journal={arXiv preprint arXiv:2205.10536},
  year={2022}
}

UMT Project

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Abstract

Video Foundation Models (VFMs) have received limited exploration due to high computational costs and data scarcity. Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain. Although VideoMAE has trained a robust ViT from limited data, its low-level reconstruction poses convergence difficulties and conflicts with high-level cross-modal alignment. This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. To increase data efficiency, we mask out most of the low-semantics video tokens, but selectively align the unmasked tokens with IFM, which serves as the UnMasked Teacher (UMT). By providing semantic guidance, our method enables faster convergence and multimodal friendliness. With a progressive pre-training framework, our model can handle various tasks including scene-related, temporal-related, and complex video-language understanding. Using only public sources for pre-training in 6 days on 32 A100 GPUs, our scratch-built ViT-L/16 achieves state-of-the-art performances on various video tasks.

Usage

Setup Environment

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/umt.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the Kinetics dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Testing commands

To test with single GPU:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/umt-base-p16-res224_kinetics710-pre-ft_u8_k400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results

Kinetics400

frame sampling strategy resolution backbone pretrain top1 acc testing protocol config ckpt
uniform 8 224x224 UMT-B Kinetics710 87.33 4 clips x 3 crop config ckpt
uniform 8 224x224 UMT-L Kinetics710 90.21 4 clips x 3 crop config ckpt

Kinetics700

frame sampling strategy resolution backbone pretrain top1 acc testing protocol config ckpt
uniform 8 224x224 UMT-B Kinetics710 77.95 4 clips x 3 crop config ckpt
uniform 8 224x224 UMT-L Kinetics710 82.79 4 clips x 3 crop config ckpt

Citation

@article{li2023unmasked,
  title={Unmasked teacher: Towards training-efficient video foundation models},
  author={Li, Kunchang and Wang, Yali and Li, Yizhuo and Wang, Yi and He, Yinan and Wang, Limin and Qiao, Yu},
  journal={arXiv preprint arXiv:2303.16058},
  year={2023}
}

CTRGCN Project

Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition

Abstract

Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition. In GCNs, graph topology dominates feature aggregation and therefore is the key to extracting representative features. In this work, we propose a novel Channel-wise Topology Refinement Graph Convolution (CTR-GC) to dynamically learn different topologies and effectively aggregate joint features in different channels for skeleton-based action recognition. The proposed CTR-GC models channel-wise topologies through learning a shared topology as a generic prior for all channels and refining it with channel-specific correlations for each channel. Our refinement method introduces few extra parameters and significantly reduces the difficulty of modeling channel-wise topologies. Furthermore, via reformulating graph convolutions into a unified form, we find that CTR-GC relaxes strict constraints of graph convolutions, leading to stronger representation capability. Combining CTR-GC with temporal modeling modules, we develop a powerful graph convolutional network named CTR-GCN which notably outperforms state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

Usage

Setup Environment

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/ctrgcn.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the NTU60 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Training commands

To train with single GPU:

mim train mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py

To train with multiple GPUs:

mim train mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands

To test with single GPU:

mim test mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/ctrgcn_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results

NTU60_XSub_2D

frame sampling strategy modality gpus backbone top1 acc testing protocol config ckpt log
uniform 100 joint 8 CTRGCN 89.6 10 clips config ckpt log

NTU60_XSub_3D

frame sampling strategy modality gpus backbone top1 acc testing protocol config ckpt log
uniform 100 joint 8 CTRGCN 89.0 10 clips config ckpt log

Citation

@inproceedings{chen2021channel,
  title={Channel-wise topology refinement graph convolution for skeleton-based action recognition},
  author={Chen, Yuxin and Zhang, Ziqi and Yuan, Chunfeng and Li, Bing and Deng, Ying and Hu, Weiming},
  booktitle={CVPR},
  pages={13359--13368},
  year={2021}
}

Gesture Recognition

Introduction

In this project, we present a skeleton based pipeline for gesture recognition. The pipeline is three-stage. The first stage consists of a hand detection module that outputs bounding boxes of human hands from video frames. Afterwards, the second stage employs a pose estimation module to generate keypoints of the detected hands. Finally, the third stage utilizes a skeleton-based gesture recognition module to classify hand actions based on the provided hand skeleton. The three-stage pipeline is lightweight and can achieve real-time on CPU devices. In this README, we provide the models and the inference demo for the project. Training data preparation and training scripts are described in TRAINING.md.

Hand detection stage

Hand detection results on OneHand10K validation dataset

Config Input Size bbox mAP bbox mAP 50 bbox mAP 75 ckpt log
rtmpose_nano 320x320 0.8100 0.9870 0.9190 ckpt log

Pose estimation stage

Pose estimation results on COCO-WholeBody-Hand validation set

Config Input Size PCK@0.2 AUC EPE ckpt
rtmpose_m 256x256 0.815 0.837 4.51 ckpt

Gesture recognition stage

Skeleton base gesture recognition results on Jester validation

Config Input Size Top 1 accuracy Top 5 accuracy ckpt log
STGCNPP 100x17x3 89.22 97.52 ckpt log
# ActionCLIP Project

ActionCLIP: A New Paradigm for Video Action Recognition

Abstract

The canonical approach to video action recognition dictates a neural model to do a classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined categories, limiting their transferable ability on new datasets with unseen concepts. In this paper, we provide a new perspective on action recognition by attaching importance to the semantic information of label texts rather than simply mapping them into numbers. Specifically, we model this task as a video-text matching problem within a multimodal learning framework, which strengthens the video representation with more semantic language supervision and enables our model to do zero-shot action recognition without any further labeled data or parameters requirements. Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub “pre-train, prompt and fine-tune”. This paradigm first learns powerful representations from pre-training on a large amount of web image-text or video-text data. Then it makes the action recognition task to act more like pre-training problems via prompt engineering. Finally, it end-to-end fine-tunes on target datasets to obtain strong performance. We give an instantiation of the new paradigm, ActionCLIP, which not only has superior and flexible zero-shot/few-shot transfer ability but also reaches a top performance on general action recognition task, achieving 83.8% top-1 accuracy on Kinetics-400 with a ViT-B/16 as the backbone.

Usage

Setup Environment

Please refer to Installation to install MMAction2. Run the following command to install clip.

pip install git+https://github.com/openai/CLIP.git

Assume that you are located at $MMACTION2/projects/actionclip.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the Kinetics400 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Training commands

To train with single GPU:

mim train mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py

To train with multiple GPUs:

mim train mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands

To test with single GPU:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/actionclip_vit-base-p32-res224-clip-pre_g8xb16_1x1x8_k400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results

Kinetics400

frame sampling strategy backbone top1 acc top5 acc testing protocol config ckpt
1x1x8 ViT-B/32 77.6 93.8 8 clips x 1 crop config ckpt[1]
1x1x8 ViT-B/16 80.3 95.2 8 clips x 1 crop config ckpt[1]
1x1x16 ViT-B/16 81.1 95.6 16 clips x 1 crop config ckpt[1]
1x1x32 ViT-B/16 81.3 95.8 32 clips x 1 crop config ckpt[1]

[1] The models are ported from the repo ActionCLIP and tested on our data. Currently, we only support the testing of ActionCLIP models. Due to the variation in testing data, our reported test accuracy differs from that of the original repository (on average, it is lower by one point). Please refer to this issue for more details.

Kinetics400 (Trained on Our K400 dataset)

frame sampling strategy gpus backbone top1 acc top5 acc testing protocol config ckpt log
1x1x8 8 ViT-B/32 77.5 93.2 8 clips x 1 crop config ckpt log
1x1x8 8 ViT-B/16 81.3 95.2 8 clips x 1 crop config ckpt log

Zero-Shot Prediction

We offer two methods for zero-shot prediction as follows. The test.mp4 can be downloaded from here.

Using Naive Pytorch

import torch
import clip
from models.load import init_actionclip
from mmaction.utils import register_all_modules

register_all_modules(True)

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = init_actionclip('ViT-B/32-8', device=device)

video_anno = dict(filename='test.mp4', start_index=0)
video = preprocess(video_anno).unsqueeze(0).to(device)

template = 'The woman is {}'
labels = ['singing', 'dancing', 'performing']
text = clip.tokenize([template.format(label) for label in labels]).to(device)

with torch.no_grad():
    video_features = model.encode_video(video)
    text_features = model.encode_text(text)

video_features /= video_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
similarity = (100 * video_features @ text_features.T).softmax(dim=-1)
probs = similarity.cpu().numpy()

print("Label probs:", probs)  # [[9.995e-01 5.364e-07 6.666e-04]]

Using MMAction2 APIs

import mmengine
import torch
from mmaction.utils import register_all_modules
from mmaction.apis import inference_recognizer, init_recognizer

register_all_modules(True)

config_path = 'configs/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb.py'
checkpoint_path = 'https://download.openmmlab.com/mmaction/v1.0/projects/actionclip/actionclip_vit-base-p32-res224-clip-pre_1x1x8_k400-rgb/vit-b-32-8f.pth'
template = 'The woman is {}'
labels = ['singing', 'dancing', 'performing']

# Update the labels, the default is the label list of K400.
config = mmengine.Config.fromfile(config_path)
config.model.labels_or_label_file = labels
config.model.template = template

device = "cuda" if torch.cuda.is_available() else "cpu"
model = init_recognizer(config=config, checkpoint=checkpoint_path, device=device)

pred_result = inference_recognizer(model, 'test.mp4')
probs = pred_result.pred_score.cpu().numpy()
print("Label probs:", probs)  # [9.995e-01 5.364e-07 6.666e-04]

Citation

@article{wang2021actionclip,
  title={Actionclip: A new paradigm for video action recognition},
  author={Wang, Mengmeng and Xing, Jiazheng and Liu, Yong},
  journal={arXiv preprint arXiv:2109.08472},
  year={2021}
}

MSG3D Project

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Abstract

Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

Usage

Setup Environment

Please refer to Installation to install MMAction2.

Assume that you are located at $MMACTION2/projects/msg3d.

Add the current folder to PYTHONPATH, so that Python can find your code. Run the following command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the NTU60 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Data Preparation

Prepare the NTU60 dataset according to the instruction.

Training commands

To train with single GPU:

mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py

To train with multiple GPUs:

mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands

To test with single GPU:

mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results

NTU60_XSub_2D

frame sampling strategy modality gpus backbone top1 acc testing protocol config ckpt log
uniform 100 joint 8 MSG3D 92.3 10 clips config ckpt log

NTU60_XSub_3D

frame sampling strategy modality gpus backbone top1 acc testing protocol config ckpt log
uniform 100 joint 8 MSG3D 89.6 10 clips config ckpt log

Citation

@inproceedings{liu2020disentangling,
  title={Disentangling and unifying graph convolutions for skeleton-based action recognition},
  author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli},
  booktitle={CVPR},
  pages={143--152},
  year={2020}
}
Read the Docs v: dev-1.x
Versions
latest
stable
1.x
0.x
dev-1.x
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.