Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Create preprocessed training files: metadata.json is missing ids in the train.txt, test.txt and val.txt

See original GitHub issue

When I run the following -

python specter/data_utils/create_training_files.py \
--data-dir data/training \
--metadata data/training/metadata.json \
--outdir data/preprocessed/

I get done getting triplets, success rate:0.00%

and my data-metrics.json looks like -

{
  "train": 0,
  "val": 0,
  "test": 0
}

I debugged the code and found that at line there is a key error when self.metadata is called. Looks like the ids in train.txt, val.txt and test.txt are not in the metadata.json file

Please help and share the correct metadata.json file

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6

Top GitHub Comments

4reactions

yrrahcommented, Apr 25, 2021

The data.json contains many ids that don’t exist in metadata.json I made up a new data.json that works data.txt

2reactions

chashimocommented, Sep 8, 2020

I got the same problem. It seems that metadata.json requires ‘paper_id’ in addition to ‘title’ and ‘abstract’.

Top Results From Across the Web

Google Colab can't access drive content - Stack Overflow

I think you are missing a leading / in your /content/drive... path. It's typical to mount you Drive files via from google.colab import...

Training Instance Segmentation Models Using Mask R-CNN ...

You learn how to access and use pretrained models from NGC, train a Mask R-CNN model with minimal effort, and deploy it for...

Yolo-v5 Object Detection on a custom dataset. - Towards AI

DataFrame Preprocessed. Now we split the dataset into training and validation and save the corresponding images and it's labeled .txt files.

Model Catalog — ADS 2.6.5 documentation

The model catalog is agnostic as to which approach was used to create the model ... Python libraries and their versions in the...

Creating your own dataset - Hugging Face Course

Training a multilabel classifier that can tag issues with metadata based on the issue's description (e.g., “bug,” “enhancement,” or “question”); Creating a ...