question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hi, thanks for providing this wonderful repository! I’m trying to reproduce the results on AVA dataset with MViT model, but I only achieved ~20 map so far. I built the config file from the implementation details reported in the paper and changed the head of the MViT model to head_helper.ResNetRoIHead, I load the weights from the provided Kinetics checkpoint.

Should I be able to reproduce the results from the paper that way?

Thanks, Elad.

The config file:

TRAIN:
  ENABLE: True
  DATASET: ava
  BATCH_SIZE: 64
  EVAL_PERIOD: 5
  CHECKPOINT_PERIOD: 1
  AUTO_RESUME: True
  CHECKPOINT_FILE_PATH: CPS/Kinetics400/K400_MVIT_B_16x4_CONV.pyth
  CHECKPOINT_TYPE: pytorch
  CHECKPOINT_EPOCH_RESET: True
DATA:
  NUM_FRAMES: 16
  SAMPLING_RATE: 4
  TRAIN_JITTER_SCALES: [256, 320]
  TRAIN_CROP_SIZE: 224
  TEST_CROP_SIZE: 224
  INPUT_CHANNEL_NUM: [3]
  TRAIN_JITTER_SCALES_RELATIVE: [0.08, 1.0]
  TRAIN_JITTER_ASPECT_RELATIVE: [0.75, 1.3333]
DETECTION:
  ENABLE: True
  ALIGNED: True
AVA:
  DETECTION_SCORE_THRESH: 0.8
  TRAIN_PREDICT_BOX_LISTS: [
    "ava_train_v2.2.csv",
    "person_box_67091280_iou90/ava_detection_train_boxes_and_labels_include_negative_v2.2.csv",
  ]
  TEST_PREDICT_BOX_LISTS: ["person_box_67091280_iou90/ava_detection_val_boxes_and_labels.csv"]
  BGR: False
MVIT:
  ZERO_DECAY_POS_CLS: False
  SEP_POS_EMBED: True
  DEPTH: 16
  NUM_HEADS: 1
  EMBED_DIM: 96
  PATCH_KERNEL: (3, 7, 7)
  PATCH_STRIDE: (2, 4, 4)
  PATCH_PADDING: (1, 3, 3)
  MLP_RATIO: 4.0
  QKV_BIAS: True
  DROPPATH_RATE: 0.4
  NORM: "layernorm"
  MODE: "conv"
  CLS_EMBED_ON: False
  DIM_MUL: [[1, 2.0], [3, 2.0], [14, 2.0]]
  HEAD_MUL: [[1, 2.0], [3, 2.0], [14, 2.0]]
  POOL_KVQ_KERNEL: [3, 3, 3]
  POOL_KV_STRIDE_ADAPTIVE: [1, 8, 8]
  POOL_Q_STRIDE: [[1, 1, 2, 2], [3, 1, 2, 2], [14, 1, 2, 2]]
  DROPOUT_RATE: 0.0
AUG:
  NUM_SAMPLE: 2
  ENABLE: True
  COLOR_JITTER: 0.4
  AA_TYPE: rand-m7-n4-mstd0.5-inc1
  INTERPOLATION: bicubic
  RE_PROB: 0.25
  RE_MODE: pixel
  RE_COUNT: 1
  RE_SPLIT: False
MIXUP:
  ENABLE: False
  ALPHA: 0.8
  CUTMIX_ALPHA: 1.0
  PROB: 1.0
  SWITCH_PROB: 0.5
  LABEL_SMOOTH_VALUE: 0.1
BN:
  USE_PRECISE_STATS: False
  NUM_BATCHES_PRECISE: 200
SOLVER:
  ZERO_WD_1D_PARAM: True
  CLIP_GRAD_L2NORM: 1.0
  BASE_LR_SCALE_NUM_SHARDS: True
  BASE_LR: 0.6
  COSINE_END_LR: 1e-6
  WARMUP_START_LR: 1e-6
  WARMUP_EPOCHS: 5.0
  LR_POLICY: cosine
  MAX_EPOCH: 30
  MOMENTUM: 0.9
  WEIGHT_DECAY: 1e-8
  OPTIMIZING_METHOD: sgd
  COSINE_AFTER_WARMUP: True
MODEL:
  NUM_CLASSES: 80
  ARCH: mvit
  MODEL_NAME: MViT
  LOSS_FUNC: bce
  DROPOUT_RATE: 0.5
TEST:
  ENABLE: True
  DATASET: ava
  BATCH_SIZE: 8
  NUM_SPATIAL_CROPS: 1
DATA_LOADER:
  NUM_WORKERS: 8
  PIN_MEMORY: True
NUM_GPUS: 8
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: .

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
feichtenhofercommented, Aug 5, 2021

this will be release very soon

0reactions
yuanliangzhecommented, Dec 16, 2022

Hello, may I ask if you are able to reproduce the paper reported results on AVA @eladb3? I still cannot find the corresponding training config nor the checkpoint in the repo @feichtenhofer .

Read more comments on GitHub >

github_iconTop Results From Across the Web

AVA v2.2 Benchmark (Action Recognition) - Papers With Code
Rank Model mAP Year Tags 1 VideoMAE (K400 pretrain+finetune, ViT‑H, 16x4) 39.5 2022 2 VideoMAE (K700 pretrain+finetune, ViT‑L, 16x4) 39.3 2022 Vision TransformerSelf... 3 MaskFeat (Kinetics‑600...
Read more >
val/map (22/10/22 21:19:52) – Weights & Biases - WandB
Publish your model insights with interactive plots for performance metrics, predictions, and hyperparameters. Made by Enrique Sanchez using ...
Read more >
MViTv2: Improved Multiscale Vision Transformers for ... - arXiv
In this paper, we study Multiscale Vision Transformers. (MViTv2) as a unified architecture for image and video classification, as well as object detection....
Read more >
Multiscale Vision Transformers: An architecture for modeling ...
It's a family of visual recognition models that incorporate the seminal concept of hierarchical representations into the powerful Transformer architecture. MViT ...
Read more >
Multiscale Vision Transformers - CVF Open Access
We present Multiscale Vision Transformers (MViT) for ... Charades [92], SSv2 [43] and AVA [44]). MViT ... D × T × H ×...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found