Converting marian tatoeba models
See original GitHub issueEnvironment info
transformers
version: 4.4.2- Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.10
- PyTorch version (GPU?): 1.8.0+cu101 (False)
- Tensorflow version (GPU?): 2.4.1 (False)
- Using GPU in script?: False
- Using distributed or parallel set-up in script?: False
Who can help
- marian: @patrickvonplaten, @patil-suraj
Information
Model I am using (Bert, XLNet…): marian
The problem arises when using:
- the official example scripts: Tatoeba models converting script
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: machine translation
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
All steps are the same as in official script for converting marian tatoeba models to pytorch.
Error log:
Traceback (most recent call last):
File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 1267, in <module>
resolver = TatoebaConverter(save_dir=args.save_dir)
File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 80, in __init__
released.columns = released_cols
File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 5154, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/_libs/properties.pyx", line 66, in pandas._libs.properties.AxisProperty.__set__
File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 564, in _set_axis
self._mgr.set_axis(axis, labels)
File "/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py", line 227, in set_axis
f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 7 elements, new values have 9 elements
Expected behavior
IMO, main problem is in changes in fields of file Tatoeba-Challenge/models/released-models.txt. I’m expecting clean conversion of model for choosed language pair.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
transformers/convert_marian_tatoeba_to_pytorch.py at main
Convert Tatoeba -Challenge models to huggingface format. Steps: 1. Convert numpy state dict to hf format (same code as OPUS-MT-Train conversion).
Read more >MarianMT - Hugging Face
Code to bulk convert models can be found in convert_marian_to_pytorch.py . ... Since Marian models are smaller than many other translation models available ......
Read more >Model Compression for Chinese-English Neural Machine ...
Our original Marian model utilizes a vocabulary of 65,000 words in 512 dimensions, requiring over 33M embedding parameters (all in float 32) to...
Read more >arXiv:2010.06354v1 [cs.CL] 13 Oct 2020
This paper describes the development of a new benchmark for machine translation that pro- vides training and test data for thousands of.
Read more >Helsinki-NLP on Twitter: "Most of our Opus-MT models are ...
That's fantastic! Is the architecture of the models are vanilla transformer? ... Is the conversion from Marian to Huggingface's Transformers also available?
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@patil-suraj unstale?
I will take a look at it this week.