Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Decomposable Attention: maximum dimensions exceeded when loading embeddings

See original GitHub issue

I’m trying to run the Decomposable Attention example and getting ValueError: Maximum allowed dimension exceeded when the example tries to load the embeddings.

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "spaCy/examples/keras_parikh_entailment/__main__.py", line 207, in <module>
    plac.call(main)
  File "/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "spaCy/examples/keras_parikh_entailment/__main__.py", line 195, in main
    train(train_loc, dev_loc, shape, settings)
  File "spaCy/examples/keras_parikh_entailment/__main__.py", line 52, in train
    model = build_model(get_embeddings(nlp.vocab), shape, settings)
  File "spaCy/examples/keras_parikh_entailment/spacy_hook.py", line 58, in get_embeddings
    vectors = np.zeros((num_vectors + nr_unk, vocab.vectors_length), dtype="float32")
ValueError: Maximum allowed dimension exceeded

The tests passed:

$ py.test spaCy/examples/keras_parikh_entailment/keras_decomposable_attention.py 
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /work/jzaragoza/decomposable-attention/spaCy, inifile: setup.cfg
collected 2 items                                                                                                                                                                         

spaCy/examples/keras_parikh_entailment/keras_decomposable_attention.py ..                                                                                                           [100%]

==================================================================================== warnings summary =====================================================================================
venv/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:943
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:943: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    collections.MutableMapping.register(ParseResults)

venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:3226
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:3226: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    elif isinstance( exprs, collections.Iterable ):

examples/keras_parikh_entailment/keras_decomposable_attention.py: 249 tests with warnings
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:523: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
    tensor_proto.tensor_content = nparray.tostring()

examples/keras_parikh_entailment/keras_decomposable_attention.py::test_fit_model
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    if not isinstance(values, collections.Sequence):

examples/keras_parikh_entailment/keras_decomposable_attention.py::test_fit_model
  /work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/training/tracking/data_structures.py:718: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    if not isinstance(wrapped_dict, collections.Mapping):

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================================= 2 passed, 254 warnings in 2.89s =============================================================================

How to reproduce the behaviour

pip install keras
pip install spacy
pip install tensorflow
python -m spacy download en_vectors_web_lg
python spaCy/examples/keras_parikh_entailment/ train -t snli_1.0/snli_1.0_train.jsonl -s snli_1.0/snli_1.0_dev.jsonl

I tried to debug my self a bit and noticed that some Lexeme ranks are overflowed:

In [1]: import spacy                                                                                                                                                                       

In [3]: nlp = spacy.load('en_vectors_web_lg')                                                                                                                                              

In [5]: nlp.vocab[0].rank                                                                                                                                                                  
Out[5]: 18446744073709551615

I didn’t know if that’s normal, but tried to re-download the vectors and the same happens.

Your Environment

spaCy version: 2.3.2
Platform: Linux-4.15.0-109-generic-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.5
Environment Information:
- Keras version: 2.4.3
- Tensorflow version: 2.2.0
- en_vectors_web_lg version: 2.3.0

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

ZJaumecommented, Jul 21, 2020

Changing https://github.com/explosion/spaCy/blob/a8978ca285fa7ebf0867f54723a6ba5569b1c156/examples/keras_parikh_entailment/spacy_hook.py#L51 to

num_vectors = max(lex.rank for lex in vocab if lex.rank < 2000000) + 2

solves the problem.

0reactions

github-actions[bot]commented, Oct 22, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Top Results From Across the Web

Attention with Linear Biases Enables Input Length Extrapolation.

ALiBi negatively biases attention scores with a linearly decreasing penalty proportional to the dis- tance between the relevant key and query. ...

Developing a sentence level fairness metric using word ...

This results in an output sentence embedding in shape of a 512-dimensional vector, which is then fed into downstream tasks.

Sentence embeddings in NLI with iterative refinement encoders

We build on top of one such model and propose a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative...

Word embeddings and deep learning for location prediction

We pay particular attention to contextual information for a better encoding of these features. We refer to some neural network-based models to ...

BRFP: An Efficient and Universal Sentence Embedding ...

The comparison experiment with the word vector weighted model shows that when the sentence length is longer, or the corresponding syntactic structure is...