Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Train your own data

See original GitHub issue

Hi,

I want to use ADAPET on my own data. So, I was trying to follow your recommendation with a toy dataset. I took a small subset of data/BoolQ/train.jsonl and removed the “idx” key, so it looks like my own data.

So, the file lines look like this {"question": "is ghost in the shell based on the anime", "passage": "Ghost in the Shell -- Animation studio Production I.G has produced ....", "label": false} . . . .

I used the command you provided in the README.md as follows: python cli.py --data_dir $data_dir --pattern '"[TEXT1]" has an answer in "[TEXT2]"? "[LBL]"' --dict_verbalizer '{"true": "yes", "false": "no"}'

and that command throws this error: Traceback (most recent call last): File "cli.py", line 52, in <module> train(config) File "~/ADAPET/src/train.py", line 59, in train batcher = Batcher(config, tokenizer, config.dataset) File "~/ADAPET/src/data/Batcher.py", line 21, in __init__ self.dataset_reader = DatasetReader(config, tokenizer, dataset) File "~/ADAPET/src/data/DatasetReader.py", line 44, in __init__ self.dataset_reader = GenericReader(self.config, tokenizer) File "~/ADAPET/src/data/GenericReader.py", line 24, in __init__ self.check_pattern(self.config.pattern) File "~/ADAPET/src/data/GenericReader.py", line 45, in check_pattern raise ValueError("Need at least one text ") ValueError: Need at least one text

I would highly appreciate guiding me on what I am doing wrong.

Thank you!