PorterStemmer seems to be stemming "this" -> "thi"
See original GitHub issueI’m not sure whether it’s the expected output but NLTK PorterStemmer is giving different output as compared to https://pypi.python.org/pypi/stemming/1.0
From NLTK:
>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()
>>> porter.stem('this')
u'thi'
From stemming
>>> from stemming.porter2 import stem
>>> stem('this')
'this'
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Stemming Text with NLTK - Towards Data Science
With Porter's Stemmer, these words that convey completely different meanings, would be stemmed into the same expression gener. With Snowball, ...
Read more >NLTK Stemming Words: How to Stem with NLTK? - Holistic SEO
Natural Language Tool Kit has a built-in stemming algorithm called “PorterStemmer”. “PorterStemmer” of the NLTK comes from the Linguistic ...
Read more >Sample usage for stem - NLTK
from nltk.stem.porter import * ; stemmer = PorterStemmer() ; plurals = ['caresses', 'flies', 'dies', 'mules', 'denied', ... 'died', 'agreed', 'owned', 'humbled', ...
Read more >NLTK stemmer occasionally including punctuation in stemmed ...
You need to tokenize the string before stemming: >>> from nltk.stem import PorterStemmer >>> from nltk import word_tokenize >>> text = 'This ...
Read more >Porter Stemming Algorithm - Tartarus
The Porter stemming algorithm (or 'Porter stemmer') is a process for ... part of a larger IR project, and appeared as Chapter 6...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Verified using python 3.5 (had to pip install stemming first)
Just tested this with Martin’s reference C implementation from https://tartarus.org/martin/PorterStemmer/c.txt and it also stems “this” to “thi”, so I think nltk’s behaviour is correct.
Test code:
Output:
@alvations note that your example using
stemming
that gets a different result is using the porter2 stemmer, not the porter stemmer. If you’d instead imported the porter stemmer, you’d’ve seen this:In conclusion: there’s no bug here. NLTK might want to pull in and wrap somebody’s porter2 implementation, but that’s a separate issue.