Dimension mismatch of embedding matrix when model_load_path is used because of vocabulary mismatch
See original GitHub issue@w4nderlust
Describe the bug
I have made a model using pretrained embeddings (GloVe.6b.50d). I have now built a larger training dataset and want to retrain the model by initialising weights using earlier trained model and so i am using --model_load_path
command.
But I am running into an “InvalidArgumentError : indices[111,11] = 3441 is not in [0, 3172)”
To Reproduce Following code is run:
ludwig experiment --data_csv datasets/train_set.csv --model_definition_file test_model_definition.yaml --model_load_path results/experiment_run_2/model
YAML definition:
input_features:
-
name: text
type: text
level: word
encoder: stacked_parallel_cnn
embedding_size: 50
pretrained_embeddings: home/abc/mystuff/datasets/glove.6b.50d.txt
output_features:
-
name: class
type: category
training:
-
early_stop: 10
Environment:
- OS: Windows 10
- Python version : Python 3
- Ludwig version : ludwig v0.1.2
Full error message ouput:
InvalidArgumentError (see above for traceback): indices[111,11] = 3441 is not in [0, 3172)
[[node text/embeddings_lookup (defined at /home/abc/mystuff/myenv/lib/python3.6/site - packages/ludwig/models/modules/embedding_modules.py:134) ]]
Issue Analytics
- State:
- Created 4 years ago
- Comments:9
Top GitHub Comments
To test it out remember to uninstall ludwig first and then install from master with pip install git+http://github.com/uber/ludwig.git
@w4nderlust Thanks for the YAML code, I will try and implement it on my dataset and give an update if it works or not.