Capital words should Not be stemmed
See original GitHub issueRan into this one today,
Using the default pipeline, and thus the stemmer.
The stemmer chops off -er
Which is often appropriate, but not in Proper nouns. It would not be possible to maintain a complete list of proper nouns, but Capitalization is a good sign. Thoughts?
Last names ending in -er
are escaping indexing.
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
There Are No Lowercase Letters in STEM - Medium
It's time to disrupt the disruptors and make it clear that there are four capital letters in STEM, and that it belongs to...
Read more >Capital letters: avoid these 3 common errors
In title case, only the main words take an initial capital – do not capitalise joining words like 'and', 'the', 'into' or 'of'....
Read more >Maintain proper nouns and capitalised words while stemming
First: you should assing result to word word = stemmer.stem(word).capitalize(). Second: word.title() doesn't check if word is capitalized ...
Read more >Just which words should I CAPITALIZE in titles, ANYWAY?
It's true that, in order to capitalize all words except prepositions, articles, and coordinating conjunctions, one must actually be able to ...
Read more >A look at the Ubiquitous Habit of capitalizing letters to make A ...
It seems part of The Capitalization Appeal stems from the sheer competitive ... Occasionally, capitalized words in texts or tweets not only ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As a workaround you can add a pipeline function to manually trim the apostrophe s from tokens, example fiddle.
This is actually the same issue as described in #192. I encourage you to take a look through that issue and some of the discussion. It looks like I also specifically put together a small plugin for working around more of these contractions.
I’m going to close this issue in favour of #192. @erichiller Please re-open if your issue wasn’t related to the use of apostrophes.