question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Decomposable Attention: maximum dimensions exceeded when loading embeddings

See original GitHub issue

I’m trying to run the Decomposable Attention example and getting ValueError: Maximum allowed dimension exceeded when the example tries to load the embeddings.

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "spaCy/examples/keras_parikh_entailment/__main__.py", line 207, in <module>
    plac.call(main)
  File "/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "spaCy/examples/keras_parikh_entailment/__main__.py", line 195, in main
    train(train_loc, dev_loc, shape, settings)
  File "spaCy/examples/keras_parikh_entailment/__main__.py", line 52, in train
    model = build_model(get_embeddings(nlp.vocab), shape, settings)
  File "spaCy/examples/keras_parikh_entailment/spacy_hook.py", line 58, in get_embeddings
    vectors = np.zeros((num_vectors + nr_unk, vocab.vectors_length), dtype="float32")
ValueError: Maximum allowed dimension exceeded

The tests passed:

$ py.test spaCy/examples/keras_parikh_entailment/keras_decomposable_attention.py 
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /work/jzaragoza/decomposable-attention/spaCy, inifile: setup.cfg
collected 2 items                                                                                                                                                                         

spaCy/examples/keras_parikh_entailment/keras_decomposable_attention.py ..                                                                                                           [100%]

==================================================================================== warnings summary =====================================================================================
venv/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:943
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:943: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    collections.MutableMapping.register(ParseResults)

venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:3226
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:3226: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    elif isinstance( exprs, collections.Iterable ):

examples/keras_parikh_entailment/keras_decomposable_attention.py: 249 tests with warnings
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:523: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
    tensor_proto.tensor_content = nparray.tostring()

examples/keras_parikh_entailment/keras_decomposable_attention.py::test_fit_model
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    if not isinstance(values, collections.Sequence):

examples/keras_parikh_entailment/keras_decomposable_attention.py::test_fit_model
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/training/tracking/data_structures.py:718: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    if not isinstance(wrapped_dict, collections.Mapping):

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================================= 2 passed, 254 warnings in 2.89s =============================================================================

How to reproduce the behaviour

pip install keras
pip install spacy
pip install tensorflow
python -m spacy download en_vectors_web_lg
python spaCy/examples/keras_parikh_entailment/ train -t snli_1.0/snli_1.0_train.jsonl -s snli_1.0/snli_1.0_dev.jsonl

I tried to debug my self a bit and noticed that some Lexeme ranks are overflowed:

In [1]: import spacy                                                                                                                                                                       

In [3]: nlp = spacy.load('en_vectors_web_lg')                                                                                                                                              

In [5]: nlp.vocab[0].rank                                                                                                                                                                  
Out[5]: 18446744073709551615

I didn’t know if that’s normal, but tried to re-download the vectors and the same happens.

Your Environment

  • spaCy version: 2.3.2
  • Platform: Linux-4.15.0-109-generic-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.5
  • Environment Information:
    • Keras version: 2.4.3
    • Tensorflow version: 2.2.0
    • en_vectors_web_lg version: 2.3.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ZJaumecommented, Jul 21, 2020

Changing https://github.com/explosion/spaCy/blob/a8978ca285fa7ebf0867f54723a6ba5569b1c156/examples/keras_parikh_entailment/spacy_hook.py#L51 to

num_vectors = max(lex.rank for lex in vocab if lex.rank < 2000000) + 2

solves the problem.

0reactions
github-actions[bot]commented, Oct 22, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Attention with Linear Biases Enables Input Length Extrapolation.
ALiBi negatively biases attention scores with a linearly decreasing penalty proportional to the dis- tance between the relevant key and query. ...
Read more >
Developing a sentence level fairness metric using word ...
This results in an output sentence embedding in shape of a 512-dimensional vector, which is then fed into downstream tasks.
Read more >
Sentence embeddings in NLI with iterative refinement encoders
We build on top of one such model and propose a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative...
Read more >
Word embeddings and deep learning for location prediction
We pay particular attention to contextual information for a better encoding of these features. We refer to some neural network-based models to ...
Read more >
BRFP: An Efficient and Universal Sentence Embedding ...
The comparison experiment with the word vector weighted model shows that when the sentence length is longer, or the corresponding syntactic structure is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found