Segmenting sentences at colons
See original GitHub issueFor example the following snippet will be extracted as one single sentence (ending at the last full stop), but it should perhaps be split at the colons.
Here they “warn” anyone who opposes his radical ideology:
Four police officers were sent to hospital:
Violence against police officers is not only acceptable with Bernie Sanders and Black Lives Matter terrorists, its necessary to create chaos and panic:
What kind of violent protest would be complete without Barack Obama’s good friend, domestic terrorist Bill Ayers:
It’s probably just a coincidence that on a day that <u><b>Obama</b></u> was too busy to attend Nancy Reagan’s funeral, he was able to address a crowd about his hate for Trump only hours before this organized chaos in Chicago:
And finally, we’re wondering how much our Organizer In Chief had to do with this Alinsky style chaos in Chicago:
Illegal aliens, paid Soros protesters, angry Black Lives Matter terrorists inspired by Obama’s race war and Bernie Sanders supporters who have absolutely no idea why they showed up, sent four innocent police officers to the hospital; prevented thousands of innocent Americans from exercising their First Amendment right.
Is this by intention? Is there a way to force splitting at colons? Besides this extreme example I think I came across many cases where syntok did not split at colons.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
How to Perform Sentence Segmentation or Sentence ...
Sentence Segmentation or Sentence Tokenization is the process of identifying different sentences among group of words.
Read more >Sentence Segmentation - Khulood Nasher - Medium
Sentence segmentation is the analysis of texts based on sentences. In NLP analysis, we either analyze the text data based on meaningful words...
Read more >Chapter 2: Tokenisation and Sentence Segmentation
Sentence segmentation is the process of determining the longer processing units consisting of one or more words. This task involves identifying sentence ......
Read more >The Colon Hypothesis: Word Order, Discourse Segmentation ...
Part II (Discourse segmentation) presents a dossier of criteria for segmenting Greek sentences into Kola, followed by a handful of case studies. Part...
Read more >Perform sentence segmentation on paragraphs without ...
I want to know if there was any method to segment text into sentences when periods, semi-colons, capitalization, etc. are missing.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Release 1.3.1 now supports semi-colon segmentation.
I will leave this ticket open, however, as this was specifically about segmenting colons.
Thank you, Felix, for bringing this up; A valid feature request: Colon (and semi-colon) handling is indeed a bit of a borderline affair, and technically they are sentence separators. It might make sense to support that, but I need to think about it a bit more. I’d also love to hear feedback/oppinions from other users about this.
[Correcting the title of and adding labels.]