question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

config reading error when using s2s-ft example

See original GitHub issue

Hi there,

I am trying to follow the sts-ft example with xsum, but I got this bug related to reading config file. Is that related to the cahce_dir?

Traceback (most recent call last):
  File "run_seq2seq.py", line 416, in <module>
    main()
  File "run_seq2seq.py", line 399, in main
    model, tokenizer = get_model_and_tokenizer(args)
  File "run_seq2seq.py", line 372, in get_model_and_tokenizer
    cache_dir=args.cache_dir if args.cache_dir else None)
  File "/home/bill/sugar/supervised/transformers/src/transformers/configuration_utils.py", line 176, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/bill/sugar/supervised/transformers/src/transformers/configuration_utils.py", line 226, in get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/bill/sugar/supervised/transformers/src/transformers/configuration_utils.py", line 315, in _dict_from_json_file
    text = reader.read()
  File "/home/bill/anaconda3/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

3reactions
addf400commented, Mar 23, 2020

By the way, since there is a bug when I followed the example in the readme by using --model_name_or_path unilm1.2-base-uncased (see blelow) so that I changed it to --model_name_or_path unilm1.2-base-uncased.bin. Is that a hint for debugging?

OSError: Model name '../tmp/unilm1.2-base-uncased' was not found in model name list.

Hi, thanks for the prompt reply.

Here are what I run:

mkdir tmp 
cd tmp 
wget https://unilm.blob.core.windows.net/ckpt/unilm1.2-base-uncased.bin
cd ../
git clone https://github.com/microsoft/unilm.git
rm -rf .git/
cd unilm/s2s-ft 
pip install --editable .
pip install --user methodtools py-rouge pyrouge nltk
python -c "import nltk; nltk.download('punkt')"
# git clone https://github.com/NVIDIA/apex.git && cd apex && python setup.py install --cuda_ext --cpp_ext
# Training 
 
TRAIN_FILE=../../xsum.validation.json
OUTPUT_DIR=../../tmp/save_checkpoints 
CACHE_DIR=../../tmp/cache_dir 
MODEL_PATH=../../tmp/unilm1.2-base-uncased.bin

export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \
  --train_file ${TRAIN_FILE} --output_dir ${OUTPUT_DIR} \
  --model_type unilm --model_name_or_path ${MODEL_PATH} --do_lower_case \
  --max_source_seq_length 464 --max_target_seq_length 48 \
  --fp16 --fp16_opt_level O2 \
  --per_gpu_train_batch_size 16 --gradient_accumulation_steps 1 \
  --learning_rate 5e-5 --num_warmup_steps 500 --num_training_steps 32000 --cache_dir ${CACHE_DIR}

On ubuntu and python 3.7 with latest transformors version and pytorch

remove “MODEL_PATH=…/…/tmp/unilm1.2-base-uncased.bin” set “–model_name_or_path unilm1.2-base-uncased” It will download the ckpt automatically.

0reactions
addf400commented, Mar 26, 2020

Hi @addf400 ,

It seems that when we save a model, the run_seq2seq.py only saves the model file (i.e pytorch_model.bin) and the config file config.json. However, the vocab file is not saved. Where could we find an alternative one? Thanks!

@yuchenlin For decoding, we use “–model_type unilm --tokenizer_name unilm1-base-cased” to setup the tokenizer. If you use local vocab file, just set the vocab file path to “–tokenizer_name”.

Read more comments on GitHub >

github_iconTop Results From Across the Web

'Cannot read configuration file due to insufficient permissions ...
"Cannot read configuration file..." appears on HTTP Error 500.19 when accessing the Worry-Free Business Security (WFBS) web console.
Read more >
IIS error 500.19 error when reading web.config - Stack Overflow
config file contains a malformed or unsupported XML element. if you are using url rewrite rule then install url rewrite Extention of iis....
Read more >
Internal Server Error ... Cannot read configuration file due to ...
Cause. The user which IIS is using (to read the webpage files) does not have sufficient NTFS permissions to open the files. Example:....
Read more >
HTTP Error 500.19 on Internet Information Services (IIS ...
Cause. This problem occurs because the ApplicationHost.config or Web.config file contains a malformed or unidentified XML element.
Read more >
AWS SAM CLI configuration file
For more information about this command, see Writing configurations with sam deploy --guided later in this topic. Example. Here's an example configuration file ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found