question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[example scripts] disambiguate language specification API

See original GitHub issue

Currently in example scripts like run_seq2seq.py we have:

  1. for t5
--task translation_en_to_ro
--source_prefix "translate English to Romanian: "
  1. Also these 2:
--target_lang ro_RO
--source_lang en_XX

are used only for MBart and are ignored for other models. Which means that people will unknowingly try to use these two as well when they aren’t need.

The problem in both situations is that we provide error-prone API where a user wants to change the language and forgets that there is more than one of the same and changes only one of the sets of languages, but not the other, which leads to broken outcome.

If such an error is made the specification supplied by the user becomes ambiguous, because one can’t tell which of the multiple inputs takes precedence.

Proposal: There should be only one way to input a set of languages and not multiple ways.

Specifically:

  • in case 1, probably the easiest is to leave --task translation_en_to_ro and auto-generate --source_prefix "translate English to Romanian: "
  • in case 2, assert if --target_lang or --source_lang are passed and the model is not MBart.

Thinking more about it, case 1 is a must to solve, because if a user misses --source_prefix or makes a typo in it - the train/eval won’t fail, but will mysteriously produce really bad outcome. This is not user-friendly.

@sgugger, @patrickvonplaten, @patil-suraj

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
stas00commented, Mar 9, 2021

we require running pip install -r examples/seq2seq/requirements.txt already, so why not follow suite.

1reaction
stas00commented, Mar 9, 2021

This is for the pre-trained models, but if a user provides their own model it could be any language.

Plus you have https://github.com/google-research/multilingual-t5.

I wonder if there is a python module that comes with such a map.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Natural Language API Basics | Google Cloud
This document provides a guide to the basics of using the Cloud Natural Language API. This conceptual guide covers the types of requests...
Read more >
How to Read the ECMAScript Specification - Timothy Gu
The ECMAScript Language specification (aka. the JavaScript specification, or ECMA-262) is a great resource for learning the intricacies of ...
Read more >
ECMAScript® 2023 Language Specification - TC39
Introduction. This Ecma Standard defines the ECMAScript 2023 Language. It is the fourteenth edition of the ECMAScript Language Specification.
Read more >
ECMAScript® 2022 Language Specification
Introduction. This Ecma Standard defines the ECMAScript 2022 Language. It is the thirteenth edition of the ECMAScript Language Specification.
Read more >
JSON-LD 1.1 - W3C
Terms imported from ECMAScript Language Specification ... See the Conformance section in JSON-LD 1.1 API for a formal description.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found