EndPunctuation regex fails
See original GitHub issueThe EndPunctuation
‘s regex endPunct
does not allow a space :
/([a-z])([,:;\/.(\.\.\.)\!\?]+)$/i;
in https://github.com/nlp-compromise/compromise/blob/df050d5be8994f4124274f3bfc84e4ff61814e83/src/term/methods/punctuation.js
That means for example “Who is Boris Becker ?” has no EndPunctuation
and is not detected as a question because when I look at the questions
subset, questions must have a ?
as EndPunctuation
:
let list = r.list.filter(ts => {
return ts.last().endPunctuation() === '?';
});
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Regex - Ignore punctuation at end of string - Stack Overflow
Regex - Ignore punctuation at end of string ... I am using regex to match strings, here is my code: r = re.compile(r"#?%s\b"...
Read more >bash - How to match end of last string in file which does not ...
I'm trying to list the file name with the final ascii character which is not followed by period/ punctuation. I've looked all over...
Read more >Remove all the punctuation marks from a sentence using RegEx
Given a text/string, remove all the punctuation marks from the string/text using regex. The string can have alphabets, spaces, punctuations, and numbers.
Read more >Best Practices for Regular Expressions in .NET - Microsoft Learn
Consider the input source. In general, regular expressions can accept two types of input: constrained or unconstrained.
Read more >Matches a word at the end of a string, with optional punctuation
Python Exercises, Practice and Solution: Write a Python program that matches a word at the end of a string, with optional punctuation.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Found it out - the regex
endPunct
should beconst endPunct = /([^\/,:;.()!?]{0,1})([\/,:;.()!?]+)$/i;
to allow for spaces and combinations as in “Is Trump a fool or the president of U.S. ?”
hey, just cleaning up tasks, gonna close this.
nlp(' U.S. ? ')
is a edge-case i’m willing to let us waffle on for now. cheers!