Run MViT on AVA
See original GitHub issueHi, thanks for providing this wonderful repository!
I’m trying to reproduce the results on AVA dataset with MViT model, but I only achieved ~20 map so far.
I built the config file from the implementation details reported in the paper and changed the head of the MViT model to head_helper.ResNetRoIHead
, I load the weights from the provided Kinetics checkpoint.
Should I be able to reproduce the results from the paper that way?
Thanks, Elad.
The config file:
TRAIN:
ENABLE: True
DATASET: ava
BATCH_SIZE: 64
EVAL_PERIOD: 5
CHECKPOINT_PERIOD: 1
AUTO_RESUME: True
CHECKPOINT_FILE_PATH: CPS/Kinetics400/K400_MVIT_B_16x4_CONV.pyth
CHECKPOINT_TYPE: pytorch
CHECKPOINT_EPOCH_RESET: True
DATA:
NUM_FRAMES: 16
SAMPLING_RATE: 4
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 224
INPUT_CHANNEL_NUM: [3]
TRAIN_JITTER_SCALES_RELATIVE: [0.08, 1.0]
TRAIN_JITTER_ASPECT_RELATIVE: [0.75, 1.3333]
DETECTION:
ENABLE: True
ALIGNED: True
AVA:
DETECTION_SCORE_THRESH: 0.8
TRAIN_PREDICT_BOX_LISTS: [
"ava_train_v2.2.csv",
"person_box_67091280_iou90/ava_detection_train_boxes_and_labels_include_negative_v2.2.csv",
]
TEST_PREDICT_BOX_LISTS: ["person_box_67091280_iou90/ava_detection_val_boxes_and_labels.csv"]
BGR: False
MVIT:
ZERO_DECAY_POS_CLS: False
SEP_POS_EMBED: True
DEPTH: 16
NUM_HEADS: 1
EMBED_DIM: 96
PATCH_KERNEL: (3, 7, 7)
PATCH_STRIDE: (2, 4, 4)
PATCH_PADDING: (1, 3, 3)
MLP_RATIO: 4.0
QKV_BIAS: True
DROPPATH_RATE: 0.4
NORM: "layernorm"
MODE: "conv"
CLS_EMBED_ON: False
DIM_MUL: [[1, 2.0], [3, 2.0], [14, 2.0]]
HEAD_MUL: [[1, 2.0], [3, 2.0], [14, 2.0]]
POOL_KVQ_KERNEL: [3, 3, 3]
POOL_KV_STRIDE_ADAPTIVE: [1, 8, 8]
POOL_Q_STRIDE: [[1, 1, 2, 2], [3, 1, 2, 2], [14, 1, 2, 2]]
DROPOUT_RATE: 0.0
AUG:
NUM_SAMPLE: 2
ENABLE: True
COLOR_JITTER: 0.4
AA_TYPE: rand-m7-n4-mstd0.5-inc1
INTERPOLATION: bicubic
RE_PROB: 0.25
RE_MODE: pixel
RE_COUNT: 1
RE_SPLIT: False
MIXUP:
ENABLE: False
ALPHA: 0.8
CUTMIX_ALPHA: 1.0
PROB: 1.0
SWITCH_PROB: 0.5
LABEL_SMOOTH_VALUE: 0.1
BN:
USE_PRECISE_STATS: False
NUM_BATCHES_PRECISE: 200
SOLVER:
ZERO_WD_1D_PARAM: True
CLIP_GRAD_L2NORM: 1.0
BASE_LR_SCALE_NUM_SHARDS: True
BASE_LR: 0.6
COSINE_END_LR: 1e-6
WARMUP_START_LR: 1e-6
WARMUP_EPOCHS: 5.0
LR_POLICY: cosine
MAX_EPOCH: 30
MOMENTUM: 0.9
WEIGHT_DECAY: 1e-8
OPTIMIZING_METHOD: sgd
COSINE_AFTER_WARMUP: True
MODEL:
NUM_CLASSES: 80
ARCH: mvit
MODEL_NAME: MViT
LOSS_FUNC: bce
DROPOUT_RATE: 0.5
TEST:
ENABLE: True
DATASET: ava
BATCH_SIZE: 8
NUM_SPATIAL_CROPS: 1
DATA_LOADER:
NUM_WORKERS: 8
PIN_MEMORY: True
NUM_GPUS: 8
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: .
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
AVA v2.2 Benchmark (Action Recognition) - Papers With Code
Rank Model mAP Year Tags
1 VideoMAE (K400 pretrain+finetune, ViT‑H, 16x4) 39.5 2022
2 VideoMAE (K700 pretrain+finetune, ViT‑L, 16x4) 39.3 2022 Vision TransformerSelf...
3 MaskFeat (Kinetics‑600...
Read more >val/map (22/10/22 21:19:52) – Weights & Biases - WandB
Publish your model insights with interactive plots for performance metrics, predictions, and hyperparameters. Made by Enrique Sanchez using ...
Read more >MViTv2: Improved Multiscale Vision Transformers for ... - arXiv
In this paper, we study Multiscale Vision Transformers. (MViTv2) as a unified architecture for image and video classification, as well as object detection....
Read more >Multiscale Vision Transformers: An architecture for modeling ...
It's a family of visual recognition models that incorporate the seminal concept of hierarchical representations into the powerful Transformer architecture. MViT ...
Read more >Multiscale Vision Transformers - CVF Open Access
We present Multiscale Vision Transformers (MViT) for ... Charades [92], SSv2 [43] and AVA [44]). MViT ... D × T × H ×...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
this will be release very soon
Hello, may I ask if you are able to reproduce the paper reported results on AVA @eladb3? I still cannot find the corresponding training config nor the checkpoint in the repo @feichtenhofer .