Train your own data
See original GitHub issueHi,
I want to use ADAPET on my own data. So, I was trying to follow your recommendation with a toy dataset. I took a small subset of data/BoolQ/train.jsonl and removed the “idx” key, so it looks like my own data.
So, the file lines look like this
{"question": "is ghost in the shell based on the anime", "passage": "Ghost in the Shell -- Animation studio Production I.G has produced ....", "label": false} . . . .
I used the command you provided in the README.md
as follows:
python cli.py --data_dir $data_dir --pattern '"[TEXT1]" has an answer in "[TEXT2]"? "[LBL]"' --dict_verbalizer '{"true": "yes", "false": "no"}'
and that command throws this error:
Traceback (most recent call last): File "cli.py", line 52, in <module> train(config) File "~/ADAPET/src/train.py", line 59, in train batcher = Batcher(config, tokenizer, config.dataset) File "~/ADAPET/src/data/Batcher.py", line 21, in __init__ self.dataset_reader = DatasetReader(config, tokenizer, dataset) File "~/ADAPET/src/data/DatasetReader.py", line 44, in __init__ self.dataset_reader = GenericReader(self.config, tokenizer) File "~/ADAPET/src/data/GenericReader.py", line 24, in __init__ self.check_pattern(self.config.pattern) File "~/ADAPET/src/data/GenericReader.py", line 45, in check_pattern raise ValueError("Need at least one text ") ValueError: Need at least one text
I would highly appreciate guiding me on what I am doing wrong.
Thank you!
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top GitHub Comments
Whoops, my mistake. Thought it was fixed. I’ll reopen it and attend to it soon.
Hi @Afnan-Sultan, I thought I had replied to the previous issue with the same fix you mentioned 😅. Glad to know you were able to figure it out!