Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: a must be greater than 0 unless no samples are taken while pretraining using cli

See original GitHub issue

Hi, I am using spacy version 2.3.2 for pretraining spacy tok2vec. I prepared a raw text data like the format spacy ask for. mydata.jsonl is looking like this:

{"text": "ホッケーにはデンジャラスプレーの反則があるので、膝より上にボールを浮かすことは基本的に反則になるが、その例外の一つがこのスクープである。"}
{"text": "また行きたい、そんな気持ちにさせてくれるお店です。"}

and my pretraining cli command is:

python -m spacy pretrain mydata.jsonl ja_core_news_lg outpath

After running this command I got this error: I changed japanese model version and still having the sample problem. I trained with english data and it’s okay but problem only exist in japanese language text.

   :information_source: Using GPU
   :warning: Output directory is not empty
   It is better to use an empty directory or refer to a new output path, then the
   new directory will be created for you.
   :heavy_check_mark: Saved settings to config.json
   :heavy_check_mark: Loaded input texts
   :heavy_check_mark: Loaded model 'ja_core_news_lg'

   ============== Pre-training tok2vec layer - starting at epoch 0 ==============
   # Words Total Loss Loss w/s

   Traceback (most recent call last):
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
   "main", mod_spec)
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/runpy.py", line 85, in _run_code
   exec(code, run_globals)
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/spacy/main.py", line 33, in
   plac.call(commands[command], sys.argv[1:])
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/plac_core.py", line 367, in call
   cmd, result = parser.consume(arglist)
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/plac_core.py", line 232, in consume
   return cmd, self.func(*(args + varargs + extraopts), **kwargs)
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/spacy/cli/pretrain.py", line 237, in pretrain
   model, docs, optimizer, objective=loss_func, drop=dropout
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/spacy/cli/pretrain.py", line 264, in make_update
   predictions, backprop = model.begin_update(docs, drop=drop)
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/spacy/_ml.py", line 837, in mlm_forward
   mask, docs = _apply_mask(docs, random_words, mask_prob=mask_prob)
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/spacy/_ml.py", line 884, in _apply_mask
   word = _replace_word(token.text, random_words)
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/spacy/_ml.py", line 904, in _replace_word
   return random_words.next()
   File "/home/nsl7/anaconda3/envs/spacygpu2.3/lib/python3.6/site-packages/spacy/_ml.py", line 865, in next
   numpy.random.choice(len(self.words), 10000, p=self.probs)
   File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
   ValueError: a must be greater than 0 unless no samples are taken

Issue Analytics

State:
Created 3 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

svlandegcommented, Dec 22, 2020

Thanks for reporting back @sagor71. As this feature was experimental in v2, and should work much better in v3, it’s not really a priority for us to spend much more time on this for v2. I’m happy to hear you found a working solution though. I’ll close this in the meantime, but let us know if you still run into issues!

1reaction

sagornslcommented, Dec 3, 2020

Hi @svlandeg Thanks for the suggestion. I have checked spacy v3 nightly. It’s really amazing. But present working with spacy 2. Any possible way for pretraining tok2vec for japanese in v2 will help me a lot. I am keeping this issue open for new suggestions. regards

Top Results From Across the Web

'a' must be greater than 0 unless no samples are taken #10

mtrand.pyx in mtrand.RandomState.choice(). ValueError: 'a' must be greater than 0 unless no samples are taken. 这是怎么回事啊？

a must be greater than 0 unless no samples are taken while ...

After running this command I got this error: Please help. Thanks in advance. :information_source: Using GPU :warning: Output directory is not ...

a must be greater than 0 unless no samples are taken

I am working on selecting sample value in terms of the lowest value of data whereas my lowest value of data in a...

Pandas: a must be greater than 0 unless no samples are taken

I am trying to resample the rebalanced data set 'churn_train' by 20%, or n = 158 records, to have 'True' 'Churn' column values....

Source code for transformers.tokenization_utils_base

Returns :obj:`None` if no tokens correspond to the word. """ if not self._encodings: raise ValueError("word_to_tokens() is not available when using Python ...