Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[example scripts] disambiguate language specification API

See original GitHub issue

Currently in example scripts like run_seq2seq.py we have:

for t5

--task translation_en_to_ro
--source_prefix "translate English to Romanian: "

Also these 2:

--target_lang ro_RO
--source_lang en_XX

are used only for MBart and are ignored for other models. Which means that people will unknowingly try to use these two as well when they aren’t need.

The problem in both situations is that we provide error-prone API where a user wants to change the language and forgets that there is more than one of the same and changes only one of the sets of languages, but not the other, which leads to broken outcome.

If such an error is made the specification supplied by the user becomes ambiguous, because one can’t tell which of the multiple inputs takes precedence.

Proposal: There should be only one way to input a set of languages and not multiple ways.

Specifically:

in case 1, probably the easiest is to leave --task translation_en_to_ro and auto-generate --source_prefix "translate English to Romanian: "
in case 2, assert if --target_lang or --source_lang are passed and the model is not MBart.

Thinking more about it, case 1 is a must to solve, because if a user misses --source_prefix or makes a typo in it - the train/eval won’t fail, but will mysteriously produce really bad outcome. This is not user-friendly.

@sgugger, @patrickvonplaten, @patil-suraj

Issue Analytics

State:
Created 3 years ago
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

stas00commented, Mar 9, 2021

we require running pip install -r examples/seq2seq/requirements.txt already, so why not follow suite.

1reaction

stas00commented, Mar 9, 2021

This is for the pre-trained models, but if a user provides their own model it could be any language.

Plus you have https://github.com/google-research/multilingual-t5.

I wonder if there is a python module that comes with such a map.

Read more comments on GitHub >

Top Results From Across the Web

Natural Language API Basics | Google Cloud

This document provides a guide to the basics of using the Cloud Natural Language API. This conceptual guide covers the types of requests...

How to Read the ECMAScript Specification - Timothy Gu

The ECMAScript Language specification (aka. the JavaScript specification, or ECMA-262) is a great resource for learning the intricacies of ...

ECMAScript® 2023 Language Specification - TC39

Introduction. This Ecma Standard defines the ECMAScript 2023 Language. It is the fourteenth edition of the ECMAScript Language Specification.

ECMAScript® 2022 Language Specification

Introduction. This Ecma Standard defines the ECMAScript 2022 Language. It is the thirteenth edition of the ECMAScript Language Specification.

JSON-LD 1.1 - W3C

Terms imported from ECMAScript Language Specification ... See the Conformance section in JSON-LD 1.1 API for a formal description.

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

[example scripts] inconsistency around eval vs val

Seq2seq now has larger memory requirements, OOM w/Deepspeed on previously runnable models