Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

'BucketingDataLoader' object has no attribute 'db'

See original GitHub issue

I am running on Ubuntu 18.04 with cuda 10. I have followed Setup & Installation (TL;DR) - Train model with Conda Environment.

python3.6 demo.py Found existing ./models folder, skip creating a new one! 11/20/2019 19:07:20 - INFO - main - Downloading models… 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/config.json exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/vocab.json exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/merges.txt exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/pytorch_model.bin exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/small_ft.pkl exists, return! 11/20/2019 19:07:20 - INFO - main - Done!

11/20/2019 19:07:20 - INFO - main - Downloading and Extracting Data… 11/20/2019 19:07:20 - INFO - main - Preparing Data… prepro.py --corpus ./data/train.tsv --max_seq_len 128 11/20/2019 19:07:22 - INFO - main - Done!

11/20/2019 19:07:22 - INFO - main - Generating training CMD! 11/20/2019 19:07:22 - INFO - main - If there is any problem, please copy (modify) and run command below 11/20/2019 19:07:22 - INFO - main - ######################################################################### python LSP_train.py --model_name_or_path ./models/small --init_checkpoint ./models/small/pytorch_model.bin --train_input_file ./data/train.128len.db --eval_input_file ./data/dummy_data.tsv --output_dir ./models/output_model --seed 42 --max_seq_length 128 --train_batch_size 512 --gradient_accumulation_steps 8 --eval_batch_size 64 --learning_rate 1e-5 --num_optim_steps 10000 --valid_step 5000 --warmup_steps 4000 --normalize_data true --fp16 true --lr_schedule noam --loss_scale 0.0 --no_token_id true --pbar true 11/20/2019 19:07:22 - INFO - main - ######################################################################### 11/20/2019 19:07:23 - INFO - main - train batch size = 512, new train batch size (after gradient accumulation) = 64 11/20/2019 19:07:23 - INFO - main - CUDA available? True 11/20/2019 19:07:23 - INFO - main - Input Argument Information 11/20/2019 19:07:23 - INFO - main - model_name_or_path ./models/small 11/20/2019 19:07:23 - INFO - main - seed 42 11/20/2019 19:07:23 - INFO - main - max_seq_length 128 11/20/2019 19:07:23 - INFO - main - skip_eval False 11/20/2019 19:07:23 - INFO - main - init_checkpoint ./models/small/pytorch_model.bin 11/20/2019 19:07:23 - INFO - main - train_input_file ./data/train.128len.db 11/20/2019 19:07:23 - INFO - main - eval_input_file ./data/dummy_data.tsv 11/20/2019 19:07:23 - INFO - main - continue_from 0 11/20/2019 19:07:23 - INFO - main - train_batch_size 64 11/20/2019 19:07:23 - INFO - main - gradient_accumulation_steps 8 11/20/2019 19:07:23 - INFO - main - eval_batch_size 64 11/20/2019 19:07:23 - INFO - main - learning_rate 1e-05 11/20/2019 19:07:23 - INFO - main - num_optim_steps 10000 11/20/2019 19:07:23 - INFO - main - valid_step 5000 11/20/2019 19:07:23 - INFO - main - warmup_proportion 0.1 11/20/2019 19:07:23 - INFO - main - warmup_steps 4000 11/20/2019 19:07:23 - INFO - main - normalize_data True 11/20/2019 19:07:23 - INFO - main - fp16 True 11/20/2019 19:07:23 - INFO - main - lr_schedule noam 11/20/2019 19:07:23 - INFO - main - loss_scale 0.0 11/20/2019 19:07:23 - INFO - main - no_token_id True 11/20/2019 19:07:23 - INFO - main - output_dir ./models/output_model 11/20/2019 19:07:23 - INFO - main - log_dir None 11/20/2019 19:07:23 - INFO - main - pbar True 11/20/2019 19:07:23 - INFO - main - local_rank -1 11/20/2019 19:07:23 - INFO - main - config None 11/20/2019 19:07:23 - INFO - main - device cuda 11/20/2019 19:07:23 - INFO - main - n_gpu 8 11/20/2019 19:07:23 - INFO - pytorch_pretrained_bert.tokenization_gpt2 - loading vocabulary file ./models/small/vocab.json 11/20/2019 19:07:23 - INFO - pytorch_pretrained_bert.tokenization_gpt2 - loading merges file ./models/small/merges.txt Traceback (most recent call last): File “LSP_train.py”, line 176, in <module> args.max_seq_length) File “/mnt/sdb/Tools/DialoGPT/data_loader.py”, line 114, in init self.db = shelve.open(f’{db_name}/db’, ‘r’) File “/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/shelve.py”, line 243, in open return DbfilenameShelf(filename, flag, protocol, writeback) File “/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/shelve.py”, line 227, in init Shelf.init(self, dbm.open(filename, flag), protocol, writeback) File “/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/dbm/init.py”, line 91, in open “available”.format(result)) dbm.error: db type is dbm.gnu, but the module is not available Exception ignored in: <bound method BucketingDataLoader.del of <data_loader.BucketingDataLoader object at 0x7f082fdc4cc0>> Traceback (most recent call last): File “/mnt/sdb/Tools/DialoGPT/data_loader.py”, line 151, in del self.db.close() AttributeError: ‘BucketingDataLoader’ object has no attribute ‘db’ 11/20/2019 19:07:23 - INFO - main - Done!

Issue Analytics

State:
Created 4 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

2reactions

intersuncommented, Dec 10, 2019

I’m facing the same problem, but there are required files under the folder “data”. Any ideas?

Can you print what is listed under “data” folder? Thanks!

-rw-rw-r–. 1 387872 Nov 25 11:05 dummy_data.tsv -rw-rw-r–. 1 71 Nov 25 11:05 prepare4db.sh drwxrwxr-x. 2 4096 Nov 25 14:14 train.128len.db -rw-rw-r–. 1 387872 Nov 25 11:05 train_raw.tsv -rw-rw-r–. 1 467873 Dec 9 10:58 train.tsv

I just edited demo.py so it will not have relative path problem, or at least print some informative messages for debugging purpose. can you pull and see if it works? btw I check it on my side it worked on any running directory. But let me know if you still have any questions or problems.

I have tried new version, and got the following bug, error occurred, b’gzip: ./train.tsv.gz: No such file or directory\n’ It seems that the data file is missing, how can I get it ? Thanks for help.

Can you try to run the script by setting ‘data’ as ‘dummy’ and see if it works? Or run it simply by “python demo.py” and do not put in any other arguments.

I tried ‘small’ data and indeed the data file is missing. We will try to fix it.

2reactions

Monica9502commented, Dec 10, 2019

I’m facing the same problem, but there are required files under the folder “data”. Any ideas?

Can you print what is listed under “data” folder? Thanks!

-rw-rw-r–. 1 387872 Nov 25 11:05 dummy_data.tsv -rw-rw-r–. 1 71 Nov 25 11:05 prepare4db.sh drwxrwxr-x. 2 4096 Nov 25 14:14 train.128len.db -rw-rw-r–. 1 387872 Nov 25 11:05 train_raw.tsv -rw-rw-r–. 1 467873 Dec 9 10:58 train.tsv

I just edited demo.py so it will not have relative path problem, or at least print some informative messages for debugging purpose. can you pull and see if it works?

btw I check it on my side it worked on any running directory. But let me know if you still have any questions or problems.

I have tried new version, and got the following bug, error occurred, b’gzip: ./train.tsv.gz: No such file or directory\n’ It seems that the data file is missing, how can I get it ? Thanks for help.