[Dev plan] Re-design the dataset API
See original GitHub issueThe goal of this code refactoring is to make it easier to define new datasets and pre-processing pipelines. After the refactoring, users should be able to add new datasets/pipelines without modifying existing codes, as well as reuse more components. There will be breaking changes and welcome for any discussions.
There will be at least 2 PRs to implement it.
- Use registry to manage datasets. (#924)
- Use a list of transforms to define the data pre-processing pipeline. (#935)
The dataset definition will look like this.
from .coco import CocoDataset
from .registry import DATASETS
@DATASETS.register_module
class MyDataset(CocoDataset):
CLASSES = ('class1', 'class2', 'class3')
def __init__(self,
ann_file,
pipeline,
img_prefix=None,
seg_prefix=None,
proposal_file=None,
test_mode=False):
pass
The config file will look like this.
train_pipeline = [
dict(type='LoadImage'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_bboxes', 'gt_labels']),
dict(
type='ToDataContainer',
fields=[
dict(key='img', stack=True),
dict(key='gt_bboxes'),
dict(key='gt_labels')
]),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
train_set = dict(
type='CocoDataset',
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
pipeline=train_pipeline)
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Design-First Approach to API Development: How to Implement ...
Identify which approach to your API program makes the most sense for your team and the benefits/drawbacks of each approach.
Read more >API Best Practices: Plan Your API (Part 1) - MuleSoft Blog
This is part one of the API design best practices series. ... understanding what data/ methods your API should make accessible | MuleSoft...
Read more >Is API Planning the same thing as API Design? - Stoplight Blog
Exploring the various approaches for planning an API, whether you follow the API Design-First or API Code-First workflows.
Read more >What is API design? - Red Hat
API design refers to the process of developing an API that exposes data and application functionality for use by developers and users.
Read more >Understanding the API-First Approach to Building Products
Establishing a contract involves spending more time thinking about the design of an API. It also often involves additional planning and collaboration with ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@hellock Is there any posibility of making
data_root
overridable via cmd line argument with any changes to the dataset api/config?In order to use absolute paths for dataset locations and not have data within PWD one must modify/duplicate the config files. For numerous reasons I do not feel that linking datasets into the current dir is a good solution. It works for some people, but has drawbacks.
It’d be nice to be able to
python tools/train.py ${CONFIG_FILE} --data_root /abs/path/to/data/root
and override the default ./data/dataset prefix. Very similary to how work_dir arg works, and I think it’s a valid option for the same reasons you want to specify work_dir.A change to suppor this withi minimal impact would require:
CustomDataset.__init__
withdata_root
added as an arg so thatdata_root
can be overridenAdditionally, we could refactor the
tools
directory into entrypoints for the python module. Then you could run e.g.from anywhere. This would also fix the current issue that the training/evaluation scripts are not installed when we pip-install mmdet from github directly without cloning it.