question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dimension mismatch of embedding matrix when model_load_path is used because of vocabulary mismatch

See original GitHub issue

@w4nderlust Describe the bug I have made a model using pretrained embeddings (GloVe.6b.50d). I have now built a larger training dataset and want to retrain the model by initialising weights using earlier trained model and so i am using --model_load_path command.

But I am running into an “InvalidArgumentError : indices[111,11] = 3441 is not in [0, 3172)

To Reproduce Following code is run:

ludwig experiment --data_csv datasets/train_set.csv --model_definition_file test_model_definition.yaml --model_load_path results/experiment_run_2/model

YAML definition:

input_features:
    -
        name: text
        type: text
        level: word
        encoder: stacked_parallel_cnn
        embedding_size: 50
        pretrained_embeddings: home/abc/mystuff/datasets/glove.6b.50d.txt

output_features:
    -
        name: class
        type: category

training:
    -
        early_stop: 10

Environment:

  • OS: Windows 10
  • Python version : Python 3
  • Ludwig version : ludwig v0.1.2

Full error message ouput:

InvalidArgumentError (see above for traceback): indices[111,11] = 3441 is not in [0, 3172) [[node text/embeddings_lookup (defined at /home/abc/mystuff/myenv/lib/python3.6/site - packages/ludwig/models/modules/embedding_modules.py:134) ]]

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
w4nderlustcommented, Aug 2, 2020

To test it out remember to uninstall ludwig first and then install from master with pip install git+http://github.com/uber/ludwig.git

0reactions
abc1110commented, Oct 11, 2019

@w4nderlust Thanks for the YAML code, I will try and implement it on my dataset and give an update if it works or not.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dimension mismatch of embedding matrix when ... - GitHub
When you try to train a new dataset, the vocabulary size is different so the size of the embedding matrix within the model...
Read more >
Strange error ValueError: dimension mismatch - Python Forum
# creating embedding matrix, every row is a vector representation from the vocabulary indexed by the tokenizer index. embedding_matrix = np.
Read more >
matrix dimension mismatch problem - MATLAB Answers
The assignment to 'Rb' appears to be missing. Since you are using element-wise division wih it, it appears to be a vector, so...
Read more >
how to solve dimension mismatch problem in scipy sparse csr ...
I've data for training in which the shape of X_train is: (4000, 206908). But in my test data, ... It's showing dimension mismatch....
Read more >
Can Word Embedding Help Term Mismatch Problem?
improvements because the word vectors were all trained on much larger collections and they can identify words that are used in similar contexts....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found