Question about what is Full Labeled Training and Datasets
See original GitHub issueThere required structure of the images is as follows:
# YOUR_DATA should be a directory contains coco dataset.
# For eg.:
# YOUR_DATA/
# coco/
# train2017/
# val2017/
# unlabeled2017/
# annotations/
ln -s ${YOUR_DATA} data
bash tools/dataset/prepare_coco_data.sh conduct
My Questions are:
-
If my understanding is correct, the
unlabeled2017
contains all the unlabeled images, right? -
When you say X% labeled data (e.g. 5%, 10%, etc), does that take X% from the
train2017/
training data? What happens to the 100-X% of the data in the training data? Does it get added to the unlabeled pool for training? -
When you say full-labeled training, does it mean it trains on all the data in
train2017/
(supervised) then use theunlabeled2017/
data for unsupervised part of the semi-supervised learning? Or is it just supervised training on all training dataset? -
When using a custom dataset in COCO format, do I just follow the same instructions or do I need to change something more?
Issue Analytics
- State:
- Created 2 years ago
- Comments:7
Top Results From Across the Web
What Is Training Data? How It's Used in Machine Learning
A training dataset is an initial dataset that teaches the ML models to identify desired patterns or perform a particular task.
Read more >Labeled Training Sets for Machine Learning - insideBIGDATA
One consistent problem faced by data scientists is how to obtain labels for a given data set for use with machine learning.
Read more >What is Data Labeling and How to Do It Efficiently [Tutorial]
Data labeling is the process of assigning labels to data. Explore different types of data labeling, and learn how to do it efficiently....
Read more >What Is Training Data in Machine Learning? - MonkeyLearn
Training data (or a training dataset) is the initial data used to train machine learning models. Training datasets are fed to machine learning ......
Read more >The Difference Between Training Data vs. Test Data in ...
In machine learning, datasets are split into two subsets. The first subset is known as the training data - it's a portion of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Q1: Yes Q2: Yes. Yes. Q3: Yes, the supervised baseline is trained on all labeled data (
train2017
) and the semi-supervised method is trained on all labeled data(train2017
andunlabeled2017
). Q4: I think there is something you can check before your training: 1) Do you modify the annotation file path, image file prefix in the config file and replace them with your dataset configuration?; 2) Does your dataset share the same categories with COCO? If not, add the following snippet to the config file.Yes. Just to add something like