question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Creating custom data for training

See original GitHub issue

Hi, I defined a custom datasets with 6 classes, i train the datasets with deeplabv3plus, the config like below:

The custom data structure as follow:

├─ann_dir (8)
│  ├─train
│  └─val
└─img_dir (24)
    ├─train
    └─val

deeplabv3plus_r50-d8_512x1024_80k_cityscapes_SIR.py is create as follow:

norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained='open-mmlab://resnet50_v1c',
    backbone=dict(
        type='ResNetV1c',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        dilations=(1, 1, 2, 4),
        strides=(1, 2, 1, 1),
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        norm_eval=False,
        style='pytorch',
        contract_dilation=True),
    decode_head=dict(
        type='DepthwiseSeparableASPPHead',
        in_channels=2048,
        in_index=3,
        channels=512,
        dilations=(1, 12, 24, 36),
        c1_in_channels=256,
        c1_channels=48,
        dropout_ratio=0.1,
        num_classes=6,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=1024,
        in_index=2,
        channels=256,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=6,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))
train_cfg = dict()
test_cfg = dict(mode='whole')
dataset_type = 'CustomDataset'
data_root = 'data/SIRLab_mars/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2048, 512),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='CustomDataset',
        data_root='data/SIRLab_mars/',
        img_dir='img_dir/train',
        ann_dir='ann_dir/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations'),
            dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ]),
    val=dict(
        type='CustomDataset',
        data_root='data/SIRLab_mars/',
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CustomDataset',
        data_root='data/SIRLab_mars/',
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
total_iters = 80000
checkpoint_config = dict(by_epoch=False, interval=8000)
evaluation = dict(interval=8000, metric='mIoU')
work_dir = './work_dirs/deeplabv3plus_r50-d8_512x1024_80k_cityscapes_SIR'
gpu_ids = range(0, 1)

but, i got a problem which 5 class results is NAN after 80000 iters. the training log is pasted below:

2020-10-09 11:11:05,837 - mmseg - INFO - Loaded 1090 images
2020-10-09 11:11:06,461 - mmseg - INFO - Loaded 123 images
2020-10-09 11:11:06,462 - mmseg - INFO - Start running, work_dir: /mmsegmentation/work_dirs/deeplabv3plus_r50-d8_512x1024_80k_cityscapes_SIR
2020-10-09 11:11:06,462 - mmseg - INFO - workflow: [('train', 1)], max: 80000 iters
2020-10-09 11:11:48,158 - mmseg - INFO - Iter [50/80000]        lr: 9.995e-03, eta: 14:02:00, time: 0.632, data_time: 0.005, memory: 20292, decode.loss_seg: 0.0674, decode.acc_seg: 89.3073, aux.loss_seg: 0.0679, aux.acc_seg: 87.8143, loss: 0.1354
2020-10-09 11:12:11,565 - mmseg - INFO - Iter [100/80000]       lr: 9.989e-03, eta: 12:12:26, time: 0.468, data_time: 0.005, memory: 20292, decode.loss_seg: 0.0000, decode.acc_seg: 92.3593, aux.loss_seg: 0.0004, aux.acc_seg: 92.3593, loss: 0.0004
2020-10-09 11:12:47,120 - mmseg - INFO - Iter [150/80000]       lr: 9.983e-03, eta: 13:23:25, time: 0.711, data_time: 0.005, memory: 20292, decode.loss_seg: 0.0000, decode.acc_seg: 90.7240, aux.loss_seg: 0.0003, aux.acc_seg: 90.7240, loss: 0.0003
...
2020-10-09 23:47:14,564 - mmseg - INFO - Iter [79700/80000]     lr: 1.651e-04, eta: 0:02:49, time: 0.737, data_time: 0.006, memory: 20292, decode.loss_seg: 0.0005, decode.acc_seg: 89.8289, aux.loss_seg: 0.0005, aux.acc_seg: 89.8289, loss: 0.0010
2020-10-09 23:47:38,209 - mmseg - INFO - Iter [79750/80000]     lr: 1.553e-04, eta: 0:02:21, time: 0.473, data_time: 0.006, memory: 20292, decode.loss_seg: 0.0006, decode.acc_seg: 91.7836, aux.loss_seg: 0.0005, aux.acc_seg: 91.7836, loss: 0.0011
2020-10-09 23:48:01,839 - mmseg - INFO - Iter [79800/80000]     lr: 1.453e-04, eta: 0:01:53, time: 0.473, data_time: 0.006, memory: 20292, decode.loss_seg: 0.0006, decode.acc_seg: 91.9399, aux.loss_seg: 0.0005, aux.acc_seg: 91.9399, loss: 0.0012
2020-10-09 23:48:37,906 - mmseg - INFO - Iter [79850/80000]     lr: 1.350e-04, eta: 0:01:24, time: 0.721, data_time: 0.006, memory: 20292, decode.loss_seg: 0.0006, decode.acc_seg: 92.1883, aux.loss_seg: 0.0005, aux.acc_seg: 92.1883, loss: 0.0012
2020-10-09 23:49:01,616 - mmseg - INFO - Iter [79900/80000]     lr: 1.244e-04, eta: 0:00:56, time: 0.474, data_time: 0.006, memory: 20292, decode.loss_seg: 0.0006, decode.acc_seg: 91.7220, aux.loss_seg: 0.0005, aux.acc_seg: 91.7220, loss: 0.0011
2020-10-09 23:49:25,386 - mmseg - INFO - Iter [79950/80000]     lr: 1.132e-04, eta: 0:00:28, time: 0.475, data_time: 0.006, memory: 20292, decode.loss_seg: 0.0006, decode.acc_seg: 92.6604, aux.loss_seg: 0.0006, aux.acc_seg: 92.6604, loss: 0.0012
2020-10-09 23:50:05,197 - mmseg - INFO - Saving checkpoint at 80000 iterations
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 124/123, 16.2 task/s, elapsed: 8s, ETA:     0s

2020-10-09 23:50:39,039 - mmseg - INFO - per class results:
Class                  IoU        Acc
bedrock             100.00     100.00
stone                  nan        nan
gravel                 nan        nan
sand                   nan        nan
soil                   nan        nan
others                 nan        nan
Summary:
Scope                 mIoU       mAcc       aAcc
global              100.00     100.00     100.00

2020-10-09 23:50:39,095 - mmseg - INFO - Exp name: deeplabv3plus_r50-d8_512x1024_80k_cityscapes_SIR.py
2020-10-09 23:50:39,095 - mmseg - INFO - Iter(val) [80000]      mIoU: 1.0000, mAcc: 1.0000, aAcc: 1.0000

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10

github_iconTop GitHub Comments

2reactions
yehengchencommented, Nov 6, 2020

@yehengchen Could you tell me about how to fixed it ?

I changed the Grayscale value of each category to 0,1,2…

0reactions
Peter-wengcommented, May 26, 2022

@ke-dev why i got a email for this question?i dont understand。。= =

Read more comments on GitHub >

github_iconTop Results From Across the Web

Custom training: walkthrough | TensorFlow Core
This tutorial shows you how to train a machine learning model with a custom training loop to categorize penguins by species. In this...
Read more >
Train Custom Data · ultralytics/yolov5 Wiki - GitHub
Creating a custom model to detect your objects is an iterative process of collecting and organizing images, labeling your objects of interest, ...
Read more >
Step-by-step instructions for training YOLOv7 on a Custom ...
Follow these step-by-step instructions to learn how to train YOLOv7 on custom datasets, and then test it with our sample demo on detecting...
Read more >
YOLOv7 Training on Custom Data? - Medium
Train YOLOv7 on Custom Data: · Step-1: We need to create a dataset for YOLOv7 custom training. · Step-2: For labeling on custom...
Read more >
Fine-tuning with custom datasets - Hugging Face
Now that we've read the data in, let's create a train/validation split: ... Now we can use a DistilBert model with a QA...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found