Decomposable Attention: maximum dimensions exceeded when loading embeddings
See original GitHub issueI’m trying to run the Decomposable Attention example and getting ValueError: Maximum allowed dimension exceeded
when the example tries to load the embeddings.
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "spaCy/examples/keras_parikh_entailment/__main__.py", line 207, in <module>
plac.call(main)
File "/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "spaCy/examples/keras_parikh_entailment/__main__.py", line 195, in main
train(train_loc, dev_loc, shape, settings)
File "spaCy/examples/keras_parikh_entailment/__main__.py", line 52, in train
model = build_model(get_embeddings(nlp.vocab), shape, settings)
File "spaCy/examples/keras_parikh_entailment/spacy_hook.py", line 58, in get_embeddings
vectors = np.zeros((num_vectors + nr_unk, vocab.vectors_length), dtype="float32")
ValueError: Maximum allowed dimension exceeded
The tests passed:
$ py.test spaCy/examples/keras_parikh_entailment/keras_decomposable_attention.py
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /work/jzaragoza/decomposable-attention/spaCy, inifile: setup.cfg
collected 2 items
spaCy/examples/keras_parikh_entailment/keras_decomposable_attention.py .. [100%]
==================================================================================== warnings summary =====================================================================================
venv/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15
/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:943
/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:943: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
collections.MutableMapping.register(ParseResults)
venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:3226
/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/pkg_resources/_vendor/pyparsing.py:3226: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
elif isinstance( exprs, collections.Iterable ):
examples/keras_parikh_entailment/keras_decomposable_attention.py: 249 tests with warnings
/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:523: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
tensor_proto.tensor_content = nparray.tostring()
examples/keras_parikh_entailment/keras_decomposable_attention.py::test_fit_model
/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
if not isinstance(values, collections.Sequence):
examples/keras_parikh_entailment/keras_decomposable_attention.py::test_fit_model
/work/jzaragoza/decomposable-attention/venv/lib/python3.7/site-packages/tensorflow/python/training/tracking/data_structures.py:718: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
if not isinstance(wrapped_dict, collections.Mapping):
-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================================= 2 passed, 254 warnings in 2.89s =============================================================================
How to reproduce the behaviour
pip install keras
pip install spacy
pip install tensorflow
python -m spacy download en_vectors_web_lg
python spaCy/examples/keras_parikh_entailment/ train -t snli_1.0/snli_1.0_train.jsonl -s snli_1.0/snli_1.0_dev.jsonl
I tried to debug my self a bit and noticed that some Lexeme ranks are overflowed:
In [1]: import spacy
In [3]: nlp = spacy.load('en_vectors_web_lg')
In [5]: nlp.vocab[0].rank
Out[5]: 18446744073709551615
I didn’t know if that’s normal, but tried to re-download the vectors and the same happens.
Your Environment
- spaCy version: 2.3.2
- Platform: Linux-4.15.0-109-generic-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.5
- Environment Information:
- Keras version: 2.4.3
- Tensorflow version: 2.2.0
en_vectors_web_lg
version: 2.3.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Attention with Linear Biases Enables Input Length Extrapolation.
ALiBi negatively biases attention scores with a linearly decreasing penalty proportional to the dis- tance between the relevant key and query. ...
Read more >Developing a sentence level fairness metric using word ...
This results in an output sentence embedding in shape of a 512-dimensional vector, which is then fed into downstream tasks.
Read more >Sentence embeddings in NLI with iterative refinement encoders
We build on top of one such model and propose a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative...
Read more >Word embeddings and deep learning for location prediction
We pay particular attention to contextual information for a better encoding of these features. We refer to some neural network-based models to ...
Read more >BRFP: An Efficient and Universal Sentence Embedding ...
The comparison experiment with the word vector weighted model shows that when the sentence length is longer, or the corresponding syntactic structure is...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Changing https://github.com/explosion/spaCy/blob/a8978ca285fa7ebf0867f54723a6ba5569b1c156/examples/keras_parikh_entailment/spacy_hook.py#L51 to
solves the problem.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.