question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Dev plan] Re-design the dataset API

See original GitHub issue

The goal of this code refactoring is to make it easier to define new datasets and pre-processing pipelines. After the refactoring, users should be able to add new datasets/pipelines without modifying existing codes, as well as reuse more components. There will be breaking changes and welcome for any discussions.

There will be at least 2 PRs to implement it.

  • Use registry to manage datasets. (#924)
  • Use a list of transforms to define the data pre-processing pipeline. (#935)

The dataset definition will look like this.

from .coco import CocoDataset
from .registry import DATASETS


@DATASETS.register_module
class MyDataset(CocoDataset):

    CLASSES = ('class1', 'class2', 'class3')

    def __init__(self,
                 ann_file,
                 pipeline,
                 img_prefix=None,
                 seg_prefix=None,
                 proposal_file=None,
                 test_mode=False):
        pass

The config file will look like this.

train_pipeline = [
    dict(type='LoadImage'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToTensor', keys=['gt_bboxes', 'gt_labels']),
    dict(
        type='ToDataContainer',
        fields=[
            dict(key='img', stack=True),
            dict(key='gt_bboxes'),
            dict(key='gt_labels')
        ]),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
train_set = dict(
    type='CocoDataset',
    ann_file=data_root + 'annotations/instances_train2017.json',
    img_prefix=data_root + 'train2017/',
    pipeline=train_pipeline)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
rwightmancommented, Aug 5, 2019

@hellock Is there any posibility of making data_root overridable via cmd line argument with any changes to the dataset api/config?

In order to use absolute paths for dataset locations and not have data within PWD one must modify/duplicate the config files. For numerous reasons I do not feel that linking datasets into the current dir is a good solution. It works for some people, but has drawbacks.

It’d be nice to be able to python tools/train.py ${CONFIG_FILE} --data_root /abs/path/to/data/root and override the default ./data/dataset prefix. Very similary to how work_dir arg works, and I think it’s a valid option for the same reasons you want to specify work_dir.

A change to suppor this withi minimal impact would require:

  1. moving data_root + ann_file/img_prefix concat out of the config py files and into CustomDataset.__init__ with data_root added as an arg so that data_root can be overriden
    def __init__(self,
                 data_root,
                 ann_file,
                 img_prefix,
                 img_scale,
                 ...):
        # prefix of images path
        self.img_prefix = data_root + img_prefix
        ann_file = data_root + ann_file
  1. override cfg.data_root from arg in tool scripts as with work_dir
1reaction
dsuesscommented, Aug 6, 2019

Additionally, we could refactor the tools directory into entrypoints for the python module. Then you could run e.g.

python -m mmdet train 

from anywhere. This would also fix the current issue that the training/evaluation scripts are not installed when we pip-install mmdet from github directly without cloning it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Design-First Approach to API Development: How to Implement ...
Identify which approach to your API program makes the most sense for your team and the benefits/drawbacks of each approach.
Read more >
API Best Practices: Plan Your API (Part 1) - MuleSoft Blog
This is part one of the API design best practices series. ... understanding what data/ methods your API should make accessible | MuleSoft...
Read more >
Is API Planning the same thing as API Design? - Stoplight Blog
Exploring the various approaches for planning an API, whether you follow the API Design-First or API Code-First workflows.
Read more >
What is API design? - Red Hat
API design refers to the process of developing an API that exposes data and application functionality for use by developers and users.
Read more >
Understanding the API-First Approach to Building Products
Establishing a contract involves spending more time thinking about the design of an API. It also often involves additional planning and collaboration with ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found