About Programmatically usage
See original GitHub issueI’m trying to use the package programatically. I’m doing
from subword_nmt.apply_bpe import BPE, read_vocabulary
# read/write files as UTF-8
bpe_codes_fin = codecs.open(bpe_codes, encoding='utf-8')
bpe_vocab_fin = codecs.open(bpe_vocab, encoding='utf-8')
vocabulary = read_vocabulary(bpe_vocab_fin, vocabulary_threshold)
bpe = BPE(bpe_codes_fin, merges=-1, separator='@@', vocab=vocabulary, glossaries=None)
codes = bpe.process_line(line)
Is that correct? Also, I’m not sure of the vocabulary_threshold
, since I do not see any default value. Is there any one?
Thank you.
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Programmatically Definition & Meaning - Dictionary.com
In computing, a program is a sequence of instructions (called code) that enable a computer to perform a task. Programmatically is used to...
Read more >programmatically adverb - Oxford Learner's Dictionaries
in a way that is connected with, suggests or follows a plan. Programmatically, not a great deal separated the two parties in the...
Read more >PROGRAMMATICALLY definition - Cambridge Dictionary
in a way that follows a plan or uses a particular method : Programmatically, we are guided by a set of rules. We...
Read more >Interact programmatically with the Navigation component
The Navigation component provides ways to programmatically create and interact with certain navigation elements.
Read more >How to track app usage time programmatically in android?
actually you can consider it as a session. That means when the user first open the app ,spend some time then go to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
try adding this as the first line to the BPE file:
the reason for this is explained in the README. It looks like fastBPE implements the new variant (v 0.2) as well.
As to your first question, have a look at your vocabulary file - whether you set the threshold to 5 or 500 won’t make a big difference for you, since most rare tokens are single (non-Latin) characters that won’t be affected by this.
FAIR LASER uses a different BPE implementation ( https://github.com/glample/fastBPE ), which seems to store the BPE file in a different format. It might work if you simply remove the third item in each entry (the frequency), but I can’t guarantee there’s no other inconsistency, e.g. in how UTF-8 whitespace is handled.