Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Dev plan] Re-design the dataset API

See original GitHub issue

The goal of this code refactoring is to make it easier to define new datasets and pre-processing pipelines. After the refactoring, users should be able to add new datasets/pipelines without modifying existing codes, as well as reuse more components. There will be breaking changes and welcome for any discussions.

There will be at least 2 PRs to implement it.

Use registry to manage datasets. (#924)
Use a list of transforms to define the data pre-processing pipeline. (#935)

The dataset definition will look like this.

from .coco import CocoDataset
from .registry import DATASETS


@DATASETS.register_module
class MyDataset(CocoDataset):

    CLASSES = ('class1', 'class2', 'class3')

    def __init__(self,
                 ann_file,
                 pipeline,
                 img_prefix=None,
                 seg_prefix=None,
                 proposal_file=None,
                 test_mode=False):
        pass

The config file will look like this.

train_pipeline = [
    dict(type='LoadImage'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToTensor', keys=['gt_bboxes', 'gt_labels']),
    dict(
        type='ToDataContainer',
        fields=[
            dict(key='img', stack=True),
            dict(key='gt_bboxes'),
            dict(key='gt_labels')
        ]),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
train_set = dict(
    type='CocoDataset',
    ann_file=data_root + 'annotations/instances_train2017.json',
    img_prefix=data_root + 'train2017/',
    pipeline=train_pipeline)

Issue Analytics

State:
Created 4 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

2reactions

rwightmancommented, Aug 5, 2019

@hellock Is there any posibility of making data_root overridable via cmd line argument with any changes to the dataset api/config?

In order to use absolute paths for dataset locations and not have data within PWD one must modify/duplicate the config files. For numerous reasons I do not feel that linking datasets into the current dir is a good solution. It works for some people, but has drawbacks.

It’d be nice to be able to python tools/train.py ${CONFIG_FILE} --data_root /abs/path/to/data/root and override the default ./data/dataset prefix. Very similary to how work_dir arg works, and I think it’s a valid option for the same reasons you want to specify work_dir.

A change to suppor this withi minimal impact would require:

moving data_root + ann_file/img_prefix concat out of the config py files and into CustomDataset.__init__ with data_root added as an arg so that data_root can be overriden

    def __init__(self,
                 data_root,
                 ann_file,
                 img_prefix,
                 img_scale,
                 ...):
        # prefix of images path
        self.img_prefix = data_root + img_prefix
        ann_file = data_root + ann_file

override cfg.data_root from arg in tool scripts as with work_dir

1reaction

dsuesscommented, Aug 6, 2019

Additionally, we could refactor the tools directory into entrypoints for the python module. Then you could run e.g.

python -m mmdet train

from anywhere. This would also fix the current issue that the training/evaluation scripts are not installed when we pip-install mmdet from github directly without cloning it.

Top Results From Across the Web

Design-First Approach to API Development: How to Implement ...

Identify which approach to your API program makes the most sense for your team and the benefits/drawbacks of each approach.

API Best Practices: Plan Your API (Part 1) - MuleSoft Blog

This is part one of the API design best practices series. ... understanding what data/ methods your API should make accessible | MuleSoft...

Is API Planning the same thing as API Design? - Stoplight Blog

Exploring the various approaches for planning an API, whether you follow the API Design-First or API Code-First workflows.

What is API design? - Red Hat

API design refers to the process of developing an API that exposes data and application functionality for use by developers and users.

Understanding the API-First Approach to Building Products

Establishing a contract involves spending more time thinking about the design of an API. It also often involves additional planning and collaboration with ......