question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I train with translate_enzh_wmt32k, and bleu is only 1.25.What's the reason?I appreciate you telling me

See original GitHub issue

Description

I trained model with translate_enzh_wmt32k, but bleu is only 1.25. What’s the reason?I appreciate you can tell me, thank you!

Environment information

OS: linux

Steps to reproduce:

just flow the doc command

something just like this:
$ PROBLEM=translate_enzh_wmt32k
$ MODEL=transformer
$ HPARAMS=transformer_base_single_gpu
$ DATA_DIR=$HOME/t2t_data
$ TMP_DIR=/tmp/t2t_datagen
$ TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS
$ T2T_USR_DIR=$HOME/t2t_usr_dir

$ t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:16 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
yynilcommented, Nov 3, 2018

The solution is pretty simple and straight forward.

  1. Extract all unique characters in the corpus including symbols.
  2. Use the characters as dictionary and just use the text token encoder.
  3. Train the model using transformer_base configuration.
  4. It took us 1.5 millions steps to get bleu score 22~23.

All we believe is that for unigram language like Chinese and Japanese, it’s unnecessary to do word cut any more because the deep neural network will learn how to connect characters as words better than any word cut library. The translation model proved my guess might be right. 发自我的 iPhone

在 2018年11月3日,20:34,ConnectDotz notifications@github.com 写道:

@yynil thx, did you just use a simple one-hot encoding then? is there an example (parameter?) on how to plug in our own tokenizer? would you care to share more detail? Actually, if the current implementation could not reliably produce a reasonable result(?), will you consider contributing your solution?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

1reaction
hpulfccommented, Sep 11, 2018

finally , you should have word segmentation for Chinese, then it will be suitable from SubTokenEncoder

Read more comments on GitHub >

github_iconTop Results From Across the Web

I Appreciate You: What It Means & Why You Should Say It
When someone takes the time to say “I appreciate you,” the action means so much more than just words. If someone tells you...
Read more >
"I Appreciate You": What It Means & Why It's Important to Me
You'll often find me saying "I appreciate you" at the end of posts or emails. Here's why that phrase means so much to...
Read more >
I Appreciate You vs. I Appreciate It - Next Element
Why does the phrase "I appreciate you" mean more to some people than others? It all has to do with personality differences. Find...
Read more >
15 Ways to Say "I Appreciate You" - Simply Noted
"I'm thankful for all that you do for me." Sometimes it's nice to go beyond just the words "thank you." Telling someone that...
Read more >
Inspiration from 300 Examples of Appreciations
I appreciate you giving me a good morning/goodbye kiss everyday and telling me you love me. Thank you for being so persistent at...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found