Import labeled dataset
See original GitHub issueFeature Request: import labeled data sets in BIO format. Like:
SOCCER O
- O
JAPAN B-LOC
GET O
LUCKY O
WIN O
, O
CHINA B-PER
IN O
SURPRISE O
DEFEAT O
. O
Nadim B-PER
Ladki I-PER
AL-AIN B-LOC
, O
United B-LOC
Arab I-LOC
Emirates I-LOC
1996-12-06 O
Btw, I love your tool, thanks for doing it open source
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:30 (12 by maintainers)
Top Results From Across the Web
Creating datasets and importing images | AutoML Vision
Create a dataset and specify whether to allow multiple labels on each item. Import data items into the dataset. Label the items. When...
Read more >Import pre-annotated data into Label Studio
Import predicted labels, predictions, pre-annotations, or pre-labels into Label Studio for your data labeling, machine learning, and data science projects.
Read more >How can I import existing data labels to Azure Machine ...
I have a dataset on the Azure Machine Learning Studio, which is about 1200 images. I also have a tab-delimited text file that...
Read more >Create a dataset - Labelbox Docs
A dataset is a collection of data rows imported to Labelbox at one time. ... Asset, A single cloud-hosted file to be labeled...
Read more >Loading Datasets From Disk — FiftyOne 0.18.0 documentation
import fiftyone as fo # The directory containing the dataset to import ... Alternatively, when importing labeled datasets in formats such as COCO, ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I would like to be able import labelled datasets to review, correct wrongly labeled data, continue labelling a partially labeled dataset or to add labelled data to an existing project (mostly use cases 1,2).
I think storing documents together with labels might simplify things.
I it will be decided to store annotations together with the document, the document model could be something like
Django has several third packages like https://github.com/dmkoch/django-jsonfield which can be used to provide a bit more flexible data structures. And if you will be using Postgresql Django has native/in-built fields for JSON, Arrays and more, see https://docs.djangoproject.com/en/dev/ref/contrib/postgres/fields/ .
Assuming the basic functionality does not involve updating existing documents, the import will not need to account for an
external_id
although users might be allowed to upload them as metadata for their own future references.Allowing users to update existing documents through bulk upload could be limited to admin interface or command line interface as an advanced functionality for users who are sure with what they are doing. Here users can be allowed to provide the real
id
field (the object primary key), which means if an object with the provided id already exists in a database it will be updated otherwise a new object will be created.If documents and annotations will be stored together it may also make easier to utilize existing tools like django-import-export especially for imports via admin interface.
I thoroughly redesigned APIs and models and supported labeled dataset import.
Task x format is as follows:
We can confirm the detailed format in an upload page:
This is not a perfect feature. This is the first step. There are some bugs and performance problems. So welcome your opinions and feedbacks.
Thank you for your feedback and contribution.