Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dimension mismatch of embedding matrix when model_load_path is used because of vocabulary mismatch

See original GitHub issue

@w4nderlust Describe the bug I have made a model using pretrained embeddings (GloVe.6b.50d). I have now built a larger training dataset and want to retrain the model by initialising weights using earlier trained model and so i am using --model_load_path command.

But I am running into an “InvalidArgumentError : indices[111,11] = 3441 is not in [0, 3172)”

To Reproduce Following code is run:

ludwig experiment --data_csv datasets/train_set.csv --model_definition_file test_model_definition.yaml --model_load_path results/experiment_run_2/model

YAML definition:

input_features:
    -
        name: text
        type: text
        level: word
        encoder: stacked_parallel_cnn
        embedding_size: 50
        pretrained_embeddings: home/abc/mystuff/datasets/glove.6b.50d.txt

output_features:
    -
        name: class
        type: category

training:
    -
        early_stop: 10

Environment:

OS: Windows 10
Python version : Python 3
Ludwig version : ludwig v0.1.2

Full error message ouput:

InvalidArgumentError (see above for traceback): indices[111,11] = 3441 is not in [0, 3172) [[node text/embeddings_lookup (defined at /home/abc/mystuff/myenv/lib/python3.6/site - packages/ludwig/models/modules/embedding_modules.py:134) ]]

Issue Analytics

State:
Created 4 years ago
Comments:9

Top GitHub Comments

1reaction

w4nderlustcommented, Aug 2, 2020

To test it out remember to uninstall ludwig first and then install from master with pip install git+http://github.com/uber/ludwig.git

0reactions

abc1110commented, Oct 11, 2019

@w4nderlust Thanks for the YAML code, I will try and implement it on my dataset and give an update if it works or not.

Read more comments on GitHub >

Top Results From Across the Web

Dimension mismatch of embedding matrix when ... - GitHub

When you try to train a new dataset, the vocabulary size is different so the size of the embedding matrix within the model...

Strange error ValueError: dimension mismatch - Python Forum

# creating embedding matrix, every row is a vector representation from the vocabulary indexed by the tokenizer index. embedding_matrix = np.

matrix dimension mismatch problem - MATLAB Answers

The assignment to 'Rb' appears to be missing. Since you are using element-wise division wih it, it appears to be a vector, so...

how to solve dimension mismatch problem in scipy sparse csr ...

I've data for training in which the shape of X_train is: (4000, 206908). But in my test data, ... It's showing dimension mismatch....

Can Word Embedding Help Term Mismatch Problem?

improvements because the word vectors were all trained on much larger collections and they can identify words that are used in similar contexts....

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Unable to install ludwig

gmpy prerequisite is broken on windows systems