Bug in not-contraction handling code
See original GitHub issueHi,
Thanks for the great library! I think I ran into a weird edge-case wrt not-contraction handling code. If I use the following example:
from syntok.tokenizer import Tokenizer
tok = Tokenizer(replace_not_contraction=False)
tok.split("n't")
The output is [<Token '' : "n't" @ 1>]
. Something is going wrong in the offset calculation there, that 1 should be a 0… The real example this came from is a sentence in the AIDA dataset, " Falling share prices in New York do n't hurt Mexico as long as it happens gradually , as earlier this week
.
I see the same with “don’t”: [<Token '' : 'do' @ 0>, <Token '' : "n't" @ 3>]
, that 3 should be 2 no?
Would love to hear your thoughts, not sure how to fix this neatly yet.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
What is a bug (computer bug)? - TechTarget
Unit-level bugs are simple software bugs contained within one unit of code. They are typically due to calculation or logic errors and deal...
Read more >Bugs in Programming
In programming jargon, “errors” are known as “bugs”. There are many apocryphal stories about the origin of this term and how it got...
Read more >Chapter 8 Bugs and Error Handling - Eloquent JavaScript
Other times, mistakes are introduced when converting thought into code. Either way, the result is a flawed program. Flaws in a program are...
Read more >DSpace Release 1.5.2 Notes - LYRASIS Wiki
Revision no. Date author
3716 13/04/09 23.05 mdiggory
3715 13/04/09 22.52 bollini
3714 13/04/09 22.49 bollini
Read more >5 | Dealing With Errors and Bugs - The Python Coding Book
Learning how to understand and deal with errors and bugs is an important step when learning to code. Avoiding them may not be...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Fixed by PR #17
I will review in full over the weekend, release a new version, and close this bug thereafter.
Thanks, again, Koen!
Looks like a valid bug. The best way to solve this kind of issue is to implement the corresponding (failing) test, and then explore with a debugger what’s wrong to fix it.