Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The problem occurred as pytorch moved some code around for the reducer in #44798 which broke MMDistributedDataParallel.

See original GitHub issue

I have report the bug to Pytorch, and I have get some feedback.

@uniartisan Looks like this might be an issue with how mmcv is using DDP. MMDistributedDataParallel actually uses the reducer directly and mimics some of the DDP logic. This is really not supported and frameworks shouldn’t be using the reducer directly. The problem occurred as we moved some code around for the reducer in #44798 which broke MMDistributedDataParallel.

If you add this change to mmcv, the issue is resolved (although mmcv shouldn’t be doing this in the first place):

diff --git a/mmcv/parallel/distributed.py b/mmcv/parallel/distributed.py
index 07771a3..60e2f31 100644
--- a/mmcv/parallel/distributed.py
+++ b/mmcv/parallel/distributed.py
@@ -28,6 +28,9 @@ class MMDistributedDataParallel(DistributedDataParallel):
         ``self.module.forward()`` with ``self.module.train_step()``.
         It is compatible with PyTorch 1.1 - 1.5.
         """
+        if self.reducer._rebuild_buckets():
+            print("Reducer buckets have been rebuilt in this iteration.")
+
         if getattr(self, 'require_forward_param_sync', True):
             self._sync_params()
         if self.device_ids:

The following is the error report when I run mmdetection

1.conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

install latest mmcv mmdetection
./tools/dist_train.sh configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py 2

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2020-10-29 21:24:04,114 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.5 (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
CUDA available: True
GPU 0: P106-100
GPU 1: GeForce GTX 1060 6GB
CUDA_HOME: /usr
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.1
OpenCV: 4.4.0
MMCV: 1.1.6
MMCV Compiler: GCC 10.2
MMCV CUDA Compiler: 11.0
MMDetection: 2.5.0+ac20ffe
------------------------------------------------------------

2020-10-29 21:24:04,456 - mmdet - INFO - Distributed training: True
2020-10-29 21:24:04,794 - mmdet - INFO - Config:
model = dict(
    type='FasterRCNN',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=2,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0))))
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            match_low_quality=True,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            match_low_quality=False,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        pos_weight=-1,
        debug=False))
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=1000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_train2017.json',
        img_prefix='data/coco/train2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ]),
    val=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
total_epochs = 12
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
work_dir = './work_dirs/faster_rcnn_r50_fpn_1x_coco'
gpu_ids = range(0, 1)

2020-10-29 21:24:05,218 - mmdet - INFO - load model from: torchvision://resnet50
2020-10-29 21:24:05,519 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Done (t=0.00s)
creating index...
index created!
2020-10-29 21:24:06,086 - mmdet - INFO - Start running, host: lizhiyuan@lizhiyuan-Server, work_dir: /mnt/Data/workspace/mmdetection/work_dirs/faster_rcnn_r50_fpn_1x_coco
2020-10-29 21:24:06,086 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2020-10-29 21:24:49,980 - mmdet - INFO - Epoch [1][50/146]	lr: 1.978e-03, eta: 0:24:49, time: 0.875, data_time: 0.125, memory: 3324, loss_rpn_cls: 0.3634, loss_rpn_bbox: 0.0203, loss_cls: 0.2062, acc: 95.9912, loss_bbox: 0.0038, loss: 0.5936
2020-10-29 21:25:28,628 - mmdet - INFO - Epoch [1][100/146]	lr: 3.976e-03, eta: 0:22:41, time: 0.773, data_time: 0.008, memory: 3325, loss_rpn_cls: 0.1206, loss_rpn_bbox: 0.0171, loss_cls: 0.0500, acc: 99.2227, loss_bbox: 0.0144, loss: 0.2021
2020-10-29 21:26:04,092 - mmdet - INFO - Saving checkpoint at 1 epochs
[                                                  ] 0/80, elapsed: 0s, ETA:Traceback (most recent call last):
  File "./tools/train.py", line 178, in <module>
    main()
  File "./tools/train.py", line 167, in main
    train_detector(
  File "/mnt/Data/workspace/mmdetection/mmdet/apis/train.py", line 150, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/mnt/Data/workspace/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 125, in after_train_epoch
    results = multi_gpu_test(
  File "/mnt/Data/workspace/mmdetection/mmdet/apis/test.py", line 97, in multi_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 606, in forward
    if self.reducer._rebuild_buckets():
RuntimeError: replicas_[0].size() == rebuilt_param_indices_.size() INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1603729062494/work/torch/csrc/distributed/c10d/reducer.cpp":1326, please report a bug to PyTorch. rebuilt parameter indices size is not same as original model parameters size.156 versus 22776
Traceback (most recent call last):
  File "./tools/train.py", line 178, in <module>
    main()
  File "./tools/train.py", line 167, in main
    train_detector(
  File "/mnt/Data/workspace/mmdetection/mmdet/apis/train.py", line 150, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/mnt/Data/workspace/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 125, in after_train_epoch
    results = multi_gpu_test(
  File "/mnt/Data/workspace/mmdetection/mmdet/apis/test.py", line 97, in multi_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 606, in forward
    if self.reducer._rebuild_buckets():
RuntimeError: replicas_[0].size() == rebuilt_param_indices_.size() INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1603729062494/work/torch/csrc/distributed/c10d/reducer.cpp":1326, please report a bug to PyTorch. rebuilt parameter indices size is not same as original model parameters size.156 versus 22776
Traceback (most recent call last):
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/lizhiyuan/anaconda3/envs/mmcv/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/lizhiyuan/anaconda3/envs/mmcv/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Issue Analytics

State:
Created 3 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

ZwwWaynecommented, Nov 5, 2020

Hi @uniartisan , Thanks for your bug report. We will fix the MMDDP to catch up with PyTorch 1.7. As a workaround, we suggest you use PyTorch 1.6 for now.

0reactions

pritamdamania87commented, Jan 5, 2021

Hi @pritamdamania87 The motivation of MMDDP is that we want to call module.train_step() instead of module.forward(), so that we mimic the logics in DDP use some internal APIs.

I’m wondering if this can be resolved by wrapping module during training time with a wrapper module and this wrapper module’s forward() method calls module.train_step() instead? It is fragile to use the internal APIs and a future refactor on our end might break this again.

Top Results From Across the Web

Get Started with PyTorch 2.0 Summary and Overview

Introducing PyTorch 2.0, our first steps toward the next generation 2-series release of PyTorch. Over the last few years we have innovated and ......

Meta spins off PyTorch Foundation to make AI framework ...

It will operate as part of the nonprofit Linux Foundation, and its governing board includes representatives from Nvidia, Meta, Google, Microsoft ...

[D] PyTorch is moving to the Linux Foundation - Reddit

At least some of them would probably be reluctant to contribute directly ro a Facebook project, but would be enthusiastic about contributing to ......

Announcing the PyTorch Foundation to Accelerate Progress in ...

PyTorch, one of the leading community-driven AI research frameworks, is moving to a new, independent PyTorch Foundation that will be part of the...

Dave Viner's Post - LinkedIn

PyTorch has moved to the PyTorch Foundation! As Mark posted earlier today, we announced that PyTorch is moving to a new, independent PyTorch...