Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Converting marian tatoeba models

See original GitHub issue

Environment info

transformers version: 4.4.2
Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.10
PyTorch version (GPU?): 1.8.0+cu101 (False)
Tensorflow version (GPU?): 2.4.1 (False)
Using GPU in script?: False
Using distributed or parallel set-up in script?: False

Who can help

marian: @patrickvonplaten, @patil-suraj

Information

Model I am using (Bert, XLNet…): marian

The problem arises when using:

the official example scripts: Tatoeba models converting script
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: machine translation
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

All steps are the same as in official script for converting marian tatoeba models to pytorch.

Error log:

Traceback (most recent call last):
  File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 1267, in <module>
    resolver = TatoebaConverter(save_dir=args.save_dir)
  File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 80, in __init__
    released.columns = released_cols
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 5154, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 66, in pandas._libs.properties.AxisProperty.__set__
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 564, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py", line 227, in set_axis
    f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 7 elements, new values have 9 elements

Expected behavior

IMO, main problem is in changes in fields of file Tatoeba-Challenge/models/released-models.txt. I’m expecting clean conversion of model for choosed language pair.

Issue Analytics

State:
Created 2 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

2reactions

Dmitry-Sncommented, Apr 28, 2021

@patil-suraj unstale?

1reaction

patil-surajcommented, Sep 20, 2021

I will take a look at it this week.

Top Results From Across the Web

transformers/convert_marian_tatoeba_to_pytorch.py at main

Convert Tatoeba -Challenge models to huggingface format. Steps: 1. Convert numpy state dict to hf format (same code as OPUS-MT-Train conversion).

MarianMT - Hugging Face

Code to bulk convert models can be found in convert_marian_to_pytorch.py . ... Since Marian models are smaller than many other translation models available ......

Model Compression for Chinese-English Neural Machine ...

Our original Marian model utilizes a vocabulary of 65,000 words in 512 dimensions, requiring over 33M embedding parameters (all in float 32) to...

arXiv:2010.06354v1 [cs.CL] 13 Oct 2020

This paper describes the development of a new benchmark for machine translation that pro- vides training and test data for thousands of.

Helsinki-NLP on Twitter: "Most of our Opus-MT models are ...

That's fantastic! Is the architecture of the models are vanilla transformer? ... Is the conversion from Marian to Huggingface's Transformers also available?