TrainingWrapper does not support line breaks
See original GitHub issueNotebook When training RETRO with the standard methods, TrainingWrapper does not add line breaks to the dataset. This can have a bad effect on many NLP tasks.
Input *.txt:
First Citizen:
We are accounted poor citizens, the patricians good.
What authority surfeits on would relieve us: if they
would yield us but the superfluity, while it were
wholesome, we might guess they relieved us humanely;
but they think we are too dear: the leanness that
afflicts us, the object of our misery, is as an
inventory to particularise their abundance; our
sufferance is a gain to them Let us revenge this with
our pikes, ere we become rakes: for the gods know I
speak this in hunger for bread, not in thirst for revenge.
Second Citizen:
Would you proceed especially against Caius Marcius?
All:
Against him first: he's a very dog to the commonalty.
Model output after traing:
some - - on my head, were even so salts to death strike That which may bet with tears I have found to life, which sweeter than now to dony : be known betwixcombed oaths ring yet in Corioli turnseth from him Dear life redeems doth thinkment for faith ; Or shall be slack than death within this face, PETRUCHIO : Now, wind and house or free thee better now. KATHARINA : Now, in mine honourable fellow : in your chat with me to be it, alive, I think, If to use than my wife, if this rebellious earth Have you will break out The strange s of yours cro
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Line Breaks and Wrapping - ReSharper - JetBrains
Use this page to configure how ReSharper should add or remove line breaks before/after specific language constructs, and whether to wrap long lines ......
Read more >Python textwrap Library - How to Preserve Line Breaks?
It looks like it doesn't support that. ... Here is a little module that can wrap text, break lines, handle extra indents (eg.a...
Read more >An Empirical Evaluation of Constrained Feature Selection
Univariate filter feature selection breaks Q(s, X, y) down to the ... Linear arithmetic does not allow multiplication of two variables, ...
Read more >Line Breaks and HTML Issues - WordPress.org
So I did the following: 1. Downloaded WP HTML Mail plugin. 2. HTML works perfectly fine with header and footer and links, however,...
Read more >PHP nl2br() Function - W3Schools
Insert line breaks where newlines (\n) occur in the string: <?php echo nl2br("One line.\nAnother line."); ?> The browser output of the code above...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@0x7o ohh interesting, this must be some issue with the default BERT tokenizer, i’ll take a look next week
the code will also have to be modularized to accept different models and their encoders, as a lot of the logic is specific to BERT-base