Unable to reproduce the results on Charades by using 4 GPUs
See original GitHub issueHi,
I am trying to replicate the Resnet-50-baseline
experiment on the Charades dataset. I’m using the following config -
DATASET: charades
DATADIR: /ssd_scratch/cvit/avijit/datasets/charades/Charades_v1_rgb
NUM_GPUS: 4
LOG_PERIOD: 10
MODEL:
NUM_CLASSES: 157
MODEL_NAME: resnet_video
BN_MOMENTUM: 0.9
BN_EPSILON: 1.0000001e-5
ALLOW_INPLACE_SUM: True
ALLOW_INPLACE_RELU: True
ALLOW_INPLACE_RESHAPE: True
MEMONGER: True
BN_INIT_GAMMA: 0.0
DEPTH: 50
VIDEO_ARC_CHOICE: 2
MULTI_LABEL: True
USE_AFFINE: True
RESNETS:
NUM_GROUPS: 1 # ResNet: 1x; RESNETS: 32x
WIDTH_PER_GROUP: 64 # ResNet: 64d; RESNETS: 4d
TRANS_FUNC: bottleneck_transformation_3d # bottleneck_transformation, basic_transformation
TRAIN:
DATA_TYPE: train
BATCH_SIZE: 8 #16
EVAL_PERIOD: 4000
JITTER_SCALES: [256, 320]
COMPUTE_PRECISE_BN: False
CROP_SIZE: 224
VIDEO_LENGTH: 32
SAMPLE_RATE: 4
DROPOUT_RATE: 0.3
PARAMS_FILE: pretrained_weights/r50_k400_pretrained.pkl
DATASET_SIZE: 7811
RESET_START_ITER: True
TEST:
DATA_TYPE: val
BATCH_SIZE: 4 #16
CROP_SIZE: 256
SCALE: 256
VIDEO_LENGTH: 32
SAMPLE_RATE: 4
DATASET_SIZE: 1814
SOLVER:
LR_POLICY: 'steps_with_relative_lrs' # 'step', 'steps_with_lrs', 'steps_with_relative_lrs', 'steps_with_decay'
BASE_LR: 0.01
#STEP_SIZES: [20000, 4000]
STEP_SIZES: [20000, 4000, 20000, 4000]
LRS: [1, 0.1, 0.1, 0.1]
MAX_ITER: 48000
WEIGHT_DECAY: 0.0000125
WEIGHT_DECAY_BN: 0.0
MOMENTUM: 0.9
NESTEROV: True
SCALE_MOMENTUM: True
CHECKPOINT:
DIR: '.'
CHECKPOINT_PERIOD: 4000
CONVERT_MODEL: True
NONLOCAL:
USE_ZERO_INIT_CONV: True
USE_BN: False
USE_AFFINE: True
CONV3_NONLOCAL: True
CONV4_NONLOCAL: True
USE_SCALE: True
As you can see, I am using 4 GPUs. So, I have reduced the batch size and learning rate by half. But the highest mAP I am getting is ~ 36.0. But if I do the test using your pre-trained model, I can get ~38 mAP. Can you please check my config file and suggest some changes necessary?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7
Top Results From Across the Web
arXiv:2103.03027v3 [cs.CV] 29 May 2021
Comparison of our results with the baseline model con- taining a self-attention layer modeling relationships between all the classes and ...
Read more >Action detection for untrimmed videos based on deep neural ...
the problem on how to represent untrimmed video using multiple modalities for action detection. We propose two cross-modality baselines ...
Read more >Parameter Efficient Multimodal Transformers for Video ...
Empirical results on both audio and video understanding tasks demonstrate that the proposed method does indeed learn useful representations, and that multimodal ...
Read more >Christoph Feichtenhofer
We study five different types of features and find Histograms of Oriented Gradients (HOG), a hand-crafted feature descriptor, works particularly well in terms ......
Read more >FedScale: Benchmarking Model and System Performance of ...
Abstract. We present FedScale, a federated learning (FL) benchmarking suite with realistic datasets and a scalable runtime to enable reproducible FL re-.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks a lot 😃
It worked like charm! Thanks again for your help.