'BucketingDataLoader' object has no attribute 'db'
See original GitHub issueI am running on Ubuntu 18.04 with cuda 10. I have followed Setup & Installation (TL;DR) - Train model with Conda Environment
.
python3.6 demo.py Found existing ./models folder, skip creating a new one! 11/20/2019 19:07:20 - INFO - main - Downloading models… 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/config.json exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/vocab.json exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/merges.txt exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/pytorch_model.bin exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/small_ft.pkl exists, return! 11/20/2019 19:07:20 - INFO - main - Done!
11/20/2019 19:07:20 - INFO - main - Downloading and Extracting Data… 11/20/2019 19:07:20 - INFO - main - Preparing Data… prepro.py --corpus ./data/train.tsv --max_seq_len 128 11/20/2019 19:07:22 - INFO - main - Done!
11/20/2019 19:07:22 - INFO - main - Generating training CMD! 11/20/2019 19:07:22 - INFO - main - If there is any problem, please copy (modify) and run command below 11/20/2019 19:07:22 - INFO - main - ######################################################################### python LSP_train.py --model_name_or_path ./models/small --init_checkpoint ./models/small/pytorch_model.bin --train_input_file ./data/train.128len.db --eval_input_file ./data/dummy_data.tsv --output_dir ./models/output_model --seed 42 --max_seq_length 128 --train_batch_size 512 --gradient_accumulation_steps 8 --eval_batch_size 64 --learning_rate 1e-5 --num_optim_steps 10000 --valid_step 5000 --warmup_steps 4000 --normalize_data true --fp16 true --lr_schedule noam --loss_scale 0.0 --no_token_id true --pbar true 11/20/2019 19:07:22 - INFO - main - ######################################################################### 11/20/2019 19:07:23 - INFO - main - train batch size = 512, new train batch size (after gradient accumulation) = 64 11/20/2019 19:07:23 - INFO - main - CUDA available? True 11/20/2019 19:07:23 - INFO - main - Input Argument Information 11/20/2019 19:07:23 - INFO - main - model_name_or_path ./models/small 11/20/2019 19:07:23 - INFO - main - seed 42 11/20/2019 19:07:23 - INFO - main - max_seq_length 128 11/20/2019 19:07:23 - INFO - main - skip_eval False 11/20/2019 19:07:23 - INFO - main - init_checkpoint ./models/small/pytorch_model.bin 11/20/2019 19:07:23 - INFO - main - train_input_file ./data/train.128len.db 11/20/2019 19:07:23 - INFO - main - eval_input_file ./data/dummy_data.tsv 11/20/2019 19:07:23 - INFO - main - continue_from 0 11/20/2019 19:07:23 - INFO - main - train_batch_size 64 11/20/2019 19:07:23 - INFO - main - gradient_accumulation_steps 8 11/20/2019 19:07:23 - INFO - main - eval_batch_size 64 11/20/2019 19:07:23 - INFO - main - learning_rate 1e-05 11/20/2019 19:07:23 - INFO - main - num_optim_steps 10000 11/20/2019 19:07:23 - INFO - main - valid_step 5000 11/20/2019 19:07:23 - INFO - main - warmup_proportion 0.1 11/20/2019 19:07:23 - INFO - main - warmup_steps 4000 11/20/2019 19:07:23 - INFO - main - normalize_data True 11/20/2019 19:07:23 - INFO - main - fp16 True 11/20/2019 19:07:23 - INFO - main - lr_schedule noam 11/20/2019 19:07:23 - INFO - main - loss_scale 0.0 11/20/2019 19:07:23 - INFO - main - no_token_id True 11/20/2019 19:07:23 - INFO - main - output_dir ./models/output_model 11/20/2019 19:07:23 - INFO - main - log_dir None 11/20/2019 19:07:23 - INFO - main - pbar True 11/20/2019 19:07:23 - INFO - main - local_rank -1 11/20/2019 19:07:23 - INFO - main - config None 11/20/2019 19:07:23 - INFO - main - device cuda 11/20/2019 19:07:23 - INFO - main - n_gpu 8 11/20/2019 19:07:23 - INFO - pytorch_pretrained_bert.tokenization_gpt2 - loading vocabulary file ./models/small/vocab.json 11/20/2019 19:07:23 - INFO - pytorch_pretrained_bert.tokenization_gpt2 - loading merges file ./models/small/merges.txt Traceback (most recent call last): File “LSP_train.py”, line 176, in <module> args.max_seq_length) File “/mnt/sdb/Tools/DialoGPT/data_loader.py”, line 114, in init self.db = shelve.open(f’{db_name}/db’, ‘r’) File “/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/shelve.py”, line 243, in open return DbfilenameShelf(filename, flag, protocol, writeback) File “/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/shelve.py”, line 227, in init Shelf.init(self, dbm.open(filename, flag), protocol, writeback) File “/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/dbm/init.py”, line 91, in open “available”.format(result)) dbm.error: db type is dbm.gnu, but the module is not available Exception ignored in: <bound method BucketingDataLoader.del of <data_loader.BucketingDataLoader object at 0x7f082fdc4cc0>> Traceback (most recent call last): File “/mnt/sdb/Tools/DialoGPT/data_loader.py”, line 151, in del self.db.close() AttributeError: ‘BucketingDataLoader’ object has no attribute ‘db’ 11/20/2019 19:07:23 - INFO - main - Done!
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (5 by maintainers)
Top GitHub Comments
Can you try to run the script by setting ‘data’ as ‘dummy’ and see if it works? Or run it simply by “python demo.py” and do not put in any other arguments.
I tried ‘small’ data and indeed the data file is missing. We will try to fix it.
I have tried new version, and got the following bug, error occurred, b’gzip: ./train.tsv.gz: No such file or directory\n’ It seems that the data file is missing, how can I get it ? Thanks for help.