question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Decoding problem for char-based translation

See original GitHub issue

Hi,

I modified the wmt_ende_characters to translate Macedonian to English (bleu-score after training was 0.526888).

The input sentence is:

Kosovskiot proces na privatizaciјa se ispituva

Then the t2t_trainer command shows some weird output:

INFO:tensorflow:Restoring parameters from t2t_train/model.ckpt-250000
INFO:tensorflow:Inference results INPUT: Mquqxumkqv"rtqegu"pc"rtkxcvk|cekӚc"ug"kurkvwxc
INFO:tensorflow:Inference results OUTPUT: Mukwak.cwave.gurk.fe.ce.sce.gurkwe.ce.ce
INFO:tensorflow:Writing decodes into test.txt.transformer.transformer_base.beam4.alpha0.6.decodes

Tested with version 1.0.5 and 1.0.7. Is this a bug?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
lukaszkaisercommented, Jun 25, 2017

That would be wonderful, yes, we welcome a PR! And great thanks for all the python3 work too 😃.

1reaction
vthorsteinssoncommented, Jun 24, 2017

It is a bit strange that the character-based generators in wmt.py do not use text_encoder.ByteTextEncoder() for encoding the source and target strings to vectors, but simply do a raw conversion of character ordinal values to vectors. I am working on a PR that fixes this, and at least in preliminary testing the output looks much saner.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Correcting Input Noise in SMT as a Char-Based Translation ...
Correcting Input Noise in SMT as a Char-Based Translation Problem ... involve the decoder the into the decision of which.
Read more >
Noisy UGC Translation at the Character Level - ACL Anthology
we demonstrate that char-based neural ma- chine translation models are extremely sen- sitive to unknown and rare characters on both synthetic ...
Read more >
AN EMPIRICAL STUDY OF END-TO-END SIMULTANEOUS ...
This paper proposes a decoding strategy for end-to-end si- multaneous speech translation. We leverage end-to-end mod-.
Read more >
(PDF) An Empirical Study of End-to-end Simultaneous Speech ...
This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline ...
Read more >
arXiv:2103.03233v1 [cs.CL] 4 Mar 2021
Index Terms— Simultaneous speech translation, end-to- end models, low-latency decoding. 1. INTRODUCTION. Simultaneous (online) machine ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found