question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Create preprocessed training files: metadata.json is missing ids in the train.txt, test.txt and val.txt

See original GitHub issue

When I run the following -

python specter/data_utils/create_training_files.py \
--data-dir data/training \
--metadata data/training/metadata.json \
--outdir data/preprocessed/

I get done getting triplets, success rate:0.00%

and my data-metrics.json looks like -

{
  "train": 0,
  "val": 0,
  "test": 0
}

I debugged the code and found that at line there is a key error when self.metadata is called. Looks like the ids in train.txt, val.txt and test.txt are not in the metadata.json file

Please help and share the correct metadata.json file

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:6

github_iconTop GitHub Comments

4reactions
yrrahcommented, Apr 25, 2021

The data.json contains many ids that don’t exist in metadata.json I made up a new data.json that works data.txt

2reactions
chashimocommented, Sep 8, 2020

I got the same problem. It seems that metadata.json requires ‘paper_id’ in addition to ‘title’ and ‘abstract’.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Google Colab can't access drive content - Stack Overflow
I think you are missing a leading / in your /content/drive... path. It's typical to mount you Drive files via from google.colab import...
Read more >
Training Instance Segmentation Models Using Mask R-CNN ...
You learn how to access and use pretrained models from NGC, train a Mask R-CNN model with minimal effort, and deploy it for...
Read more >
Yolo-v5 Object Detection on a custom dataset. - Towards AI
DataFrame Preprocessed. Now we split the dataset into training and validation and save the corresponding images and it's labeled .txt files.
Read more >
Model Catalog — ADS 2.6.5 documentation
The model catalog is agnostic as to which approach was used to create the model ... Python libraries and their versions in the...
Read more >
Creating your own dataset - Hugging Face Course
Training a multilabel classifier that can tag issues with metadata based on the issue's description (e.g., “bug,” “enhancement,” or “question”); Creating a ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found