question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TrainingWrapper does not support line breaks

See original GitHub issue

Notebook When training RETRO with the standard methods, TrainingWrapper does not add line breaks to the dataset. This can have a bad effect on many NLP tasks.

Input *.txt:

First Citizen:
We are accounted poor citizens, the patricians good.
What authority surfeits on would relieve us: if they
would yield us but the superfluity, while it were
wholesome, we might guess they relieved us humanely;
but they think we are too dear: the leanness that
afflicts us, the object of our misery, is as an
inventory to particularise their abundance; our
sufferance is a gain to them Let us revenge this with
our pikes, ere we become rakes: for the gods know I
speak this in hunger for bread, not in thirst for revenge.

Second Citizen:
Would you proceed especially against Caius Marcius?

All:
Against him first: he's a very dog to the commonalty.

Model output after traing:

some - - on my head, were even so salts to death strike That which may bet with tears I have found to life, which sweeter than now to dony : be known betwixcombed oaths ring yet in Corioli turnseth from him Dear life redeems doth thinkment for faith ; Or shall be slack than death within this face, PETRUCHIO : Now, wind and house or free thee better now. KATHARINA : Now, in mine honourable fellow : in your chat with me to be it, alive, I think, If to use than my wife, if this rebellious earth Have you will break out The strange s of yours cro

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
lucidrainscommented, May 15, 2022

@0x7o ohh interesting, this must be some issue with the default BERT tokenizer, i’ll take a look next week

0reactions
lucidrainscommented, May 18, 2022

the code will also have to be modularized to accept different models and their encoders, as a lot of the logic is specific to BERT-base

Read more comments on GitHub >

github_iconTop Results From Across the Web

Line Breaks and Wrapping - ReSharper - JetBrains
Use this page to configure how ReSharper should add or remove line breaks before/after specific language constructs, and whether to wrap long lines ......
Read more >
Python textwrap Library - How to Preserve Line Breaks?
It looks like it doesn't support that. ... Here is a little module that can wrap text, break lines, handle extra indents (eg.a...
Read more >
An Empirical Evaluation of Constrained Feature Selection
Univariate filter feature selection breaks Q(s, X, y) down to the ... Linear arithmetic does not allow multiplication of two variables, ...
Read more >
Line Breaks and HTML Issues - WordPress.org
So I did the following: 1. Downloaded WP HTML Mail plugin. 2. HTML works perfectly fine with header and footer and links, however,...
Read more >
PHP nl2br() Function - W3Schools
Insert line breaks where newlines (\n) occur in the string: <?php echo nl2br("One line.\nAnother line."); ?> The browser output of the code above...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found