Create preprocessed training files: metadata.json is missing ids in the train.txt, test.txt and val.txt
See original GitHub issueWhen I run the following -
python specter/data_utils/create_training_files.py \
--data-dir data/training \
--metadata data/training/metadata.json \
--outdir data/preprocessed/
I get done getting triplets, success rate:0.00%
and my data-metrics.json looks like -
{
"train": 0,
"val": 0,
"test": 0
}
I debugged the code and found that at line there is a key error when self.metadata is called. Looks like the ids in train.txt, val.txt and test.txt are not in the metadata.json file
Please help and share the correct metadata.json file
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:6
Top Results From Across the Web
Google Colab can't access drive content - Stack Overflow
I think you are missing a leading / in your /content/drive... path. It's typical to mount you Drive files via from google.colab import...
Read more >Training Instance Segmentation Models Using Mask R-CNN ...
You learn how to access and use pretrained models from NGC, train a Mask R-CNN model with minimal effort, and deploy it for...
Read more >Yolo-v5 Object Detection on a custom dataset. - Towards AI
DataFrame Preprocessed. Now we split the dataset into training and validation and save the corresponding images and it's labeled .txt files.
Read more >Model Catalog — ADS 2.6.5 documentation
The model catalog is agnostic as to which approach was used to create the model ... Python libraries and their versions in the...
Read more >Creating your own dataset - Hugging Face Course
Training a multilabel classifier that can tag issues with metadata based on the issue's description (e.g., “bug,” “enhancement,” or “question”); Creating a ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The data.json contains many ids that don’t exist in metadata.json I made up a new data.json that works data.txt
I got the same problem. It seems that metadata.json requires ‘paper_id’ in addition to ‘title’ and ‘abstract’.