Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to reproduce the results on Charades by using 4 GPUs

See original GitHub issue

Hi,

I am trying to replicate the Resnet-50-baseline experiment on the Charades dataset. I’m using the following config -

DATASET: charades
DATADIR: /ssd_scratch/cvit/avijit/datasets/charades/Charades_v1_rgb

NUM_GPUS: 4
LOG_PERIOD: 10

MODEL:
  NUM_CLASSES: 157
  MODEL_NAME: resnet_video
  BN_MOMENTUM: 0.9
  BN_EPSILON: 1.0000001e-5
  ALLOW_INPLACE_SUM: True
  ALLOW_INPLACE_RELU: True
  ALLOW_INPLACE_RESHAPE: True
  MEMONGER: True

  BN_INIT_GAMMA: 0.0
  DEPTH: 50
  VIDEO_ARC_CHOICE: 2

  MULTI_LABEL: True
  USE_AFFINE: True

RESNETS:
  NUM_GROUPS: 1  # ResNet: 1x; RESNETS: 32x
  WIDTH_PER_GROUP: 64  # ResNet: 64d; RESNETS: 4d
  TRANS_FUNC: bottleneck_transformation_3d # bottleneck_transformation, basic_transformation

TRAIN:
  DATA_TYPE: train
  BATCH_SIZE:  8 #16
  EVAL_PERIOD: 4000
  JITTER_SCALES: [256, 320]

  COMPUTE_PRECISE_BN: False
  CROP_SIZE: 224

  VIDEO_LENGTH: 32
  SAMPLE_RATE: 4
  DROPOUT_RATE: 0.3
  PARAMS_FILE: pretrained_weights/r50_k400_pretrained.pkl
  DATASET_SIZE: 7811
  RESET_START_ITER: True

TEST:
  DATA_TYPE: val
  BATCH_SIZE: 4 #16
  CROP_SIZE: 256
  SCALE: 256

  VIDEO_LENGTH: 32
  SAMPLE_RATE: 4

  DATASET_SIZE: 1814

SOLVER:
  LR_POLICY: 'steps_with_relative_lrs' # 'step', 'steps_with_lrs', 'steps_with_relative_lrs', 'steps_with_decay'
  BASE_LR: 0.01
  #STEP_SIZES: [20000, 4000]
  STEP_SIZES: [20000, 4000, 20000, 4000]
  LRS: [1, 0.1, 0.1, 0.1]
  MAX_ITER: 48000

  WEIGHT_DECAY: 0.0000125
  WEIGHT_DECAY_BN: 0.0
  MOMENTUM: 0.9
  NESTEROV: True
  SCALE_MOMENTUM: True

CHECKPOINT:
  DIR: '.'
  CHECKPOINT_PERIOD: 4000
  CONVERT_MODEL: True

NONLOCAL:
  USE_ZERO_INIT_CONV: True
  USE_BN: False
  USE_AFFINE: True
  CONV3_NONLOCAL: True
  CONV4_NONLOCAL: True
  USE_SCALE: True

As you can see, I am using 4 GPUs. So, I have reduced the batch size and learning rate by half. But the highest mAP I am getting is ~ 36.0. But if I do the test using your pre-trained model, I can get ~38 mAP. Can you please check my config file and suggest some changes necessary?

Issue Analytics

State:
Created 4 years ago
Comments:7

Top GitHub Comments

1reaction

avijit9commented, Aug 19, 2019

Thanks a lot 😃

1reaction

avijit9commented, Aug 19, 2019

It worked like charm! Thanks again for your help.

Top Results From Across the Web

arXiv:2103.03027v3 [cs.CV] 29 May 2021

Comparison of our results with the baseline model con- taining a self-attention layer modeling relationships between all the classes and ...

Action detection for untrimmed videos based on deep neural ...

the problem on how to represent untrimmed video using multiple modalities for action detection. We propose two cross-modality baselines ...

Parameter Efficient Multimodal Transformers for Video ...

Empirical results on both audio and video understanding tasks demonstrate that the proposed method does indeed learn useful representations, and that multimodal ...

Christoph Feichtenhofer

We study five different types of features and find Histograms of Oriented Gradients (HOG), a hand-crafted feature descriptor, works particularly well in terms ......

FedScale: Benchmarking Model and System Performance of ...

Abstract. We present FedScale, a federated learning (FL) benchmarking suite with realistic datasets and a scalable runtime to enable reproducible FL re-.