question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Converting marian tatoeba models

See original GitHub issue

Environment info

  • transformers version: 4.4.2
  • Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.10
  • PyTorch version (GPU?): 1.8.0+cu101 (False)
  • Tensorflow version (GPU?): 2.4.1 (False)
  • Using GPU in script?: False
  • Using distributed or parallel set-up in script?: False

Who can help

Information

Model I am using (Bert, XLNet…): marian

The problem arises when using:

  • the official example scripts: Tatoeba models converting script
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: machine translation
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

All steps are the same as in official script for converting marian tatoeba models to pytorch.

Error log:

Traceback (most recent call last):
  File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 1267, in <module>
    resolver = TatoebaConverter(save_dir=args.save_dir)
  File "src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py", line 80, in __init__
    released.columns = released_cols
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 5154, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 66, in pandas._libs.properties.AxisProperty.__set__
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 564, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py", line 227, in set_axis
    f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 7 elements, new values have 9 elements

Expected behavior

IMO, main problem is in changes in fields of file Tatoeba-Challenge/models/released-models.txt. I’m expecting clean conversion of model for choosed language pair.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
Dmitry-Sncommented, Apr 28, 2021

@patil-suraj unstale?

1reaction
patil-surajcommented, Sep 20, 2021

I will take a look at it this week.

Read more comments on GitHub >

github_iconTop Results From Across the Web

transformers/convert_marian_tatoeba_to_pytorch.py at main
Convert Tatoeba -Challenge models to huggingface format. Steps: 1. Convert numpy state dict to hf format (same code as OPUS-MT-Train conversion).
Read more >
MarianMT - Hugging Face
Code to bulk convert models can be found in convert_marian_to_pytorch.py . ... Since Marian models are smaller than many other translation models available ......
Read more >
Model Compression for Chinese-English Neural Machine ...
Our original Marian model utilizes a vocabulary of 65,000 words in 512 dimensions, requiring over 33M embedding parameters (all in float 32) to...
Read more >
arXiv:2010.06354v1 [cs.CL] 13 Oct 2020
This paper describes the development of a new benchmark for machine translation that pro- vides training and test data for thousands of.
Read more >
Helsinki-NLP on Twitter: "Most of our Opus-MT models are ...
That's fantastic! Is the architecture of the models are vanilla transformer? ... Is the conversion from Marian to Huggingface's Transformers also available?
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found