question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in not-contraction handling code

See original GitHub issue

Hi,

Thanks for the great library! I think I ran into a weird edge-case wrt not-contraction handling code. If I use the following example:

from syntok.tokenizer import Tokenizer

tok = Tokenizer(replace_not_contraction=False)
tok.split("n't")

The output is [<Token '' : "n't" @ 1>]. Something is going wrong in the offset calculation there, that 1 should be a 0… The real example this came from is a sentence in the AIDA dataset, " Falling share prices in New York do n't hurt Mexico as long as it happens gradually , as earlier this week.

I see the same with “don’t”: [<Token '' : 'do' @ 0>, <Token '' : "n't" @ 3>], that 3 should be 2 no?

Would love to hear your thoughts, not sure how to fix this neatly yet.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
fnlcommented, Dec 17, 2021

Fixed by PR #17

I will review in full over the weekend, release a new version, and close this bug thereafter.

Thanks, again, Koen!

1reaction
fnlcommented, Dec 16, 2021

Looks like a valid bug. The best way to solve this kind of issue is to implement the corresponding (failing) test, and then explore with a debugger what’s wrong to fix it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is a bug (computer bug)? - TechTarget
Unit-level bugs are simple software bugs contained within one unit of code. They are typically due to calculation or logic errors and deal...
Read more >
Bugs in Programming
In programming jargon, “errors” are known as “bugs”. There are many apocryphal stories about the origin of this term and how it got...
Read more >
Chapter 8 Bugs and Error Handling - Eloquent JavaScript
Other times, mistakes are introduced when converting thought into code. Either way, the result is a flawed program. Flaws in a program are...
Read more >
DSpace Release 1.5.2 Notes - LYRASIS Wiki
Revision no. Date author 3716 13/04/09 23.05 mdiggory 3715 13/04/09 22.52 bollini 3714 13/04/09 22.49 bollini
Read more >
5 | Dealing With Errors and Bugs - The Python Coding Book
Learning how to understand and deal with errors and bugs is an important step when learning to code. Avoiding them may not be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found