Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

stem_word and lower case stem output in v3.2.2

See original GitHub issue

Hi,

Just wanted to confirm the following things affected by the recent upgrade to v3.2.2:

There is no stem_word function in PorterStemmer(). I had to replace it with stem()
Unlike before, stem returns lower case of a word e.g. stem for Stemming (http://text-processing.com/demo/stem/ is the same as before).

Cheers, Ehsan

Issue Analytics

State:
Created 7 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

alvationscommented, Sep 14, 2018

Closing issue, https://github.com/nltk/nltk_book/pull/216 is merged =)

Thanks @iranianpep @jayvdb @RodgerKibble for raising the issue! Thanks @ExplodingCabbage for verifying the changes!

0reactions

alvationscommented, Aug 28, 2018

Validating the outputs from Chapter 3.6 of the book.

>>> from nltk import word_tokenize
>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()

>>> raw = """DENNIS: Listen, strange women lying in ponds distributing swords
... is no basis for a system of government.  Supreme executive power derives from
... a mandate from the masses, not from some farcical aquatic ceremony."""

>>> tokens = word_tokenize(raw)

>>> [porter.stem(t) for t in tokens]
['denni', ':', 'listen', ',', 'strang', 'women', 'lie', 'in', 'pond', 'distribut', 'sword', 'is', 'no', 'basi', 'for', 'a', 'system', 'of', 'govern', '.', 'suprem', 'execut', 'power', 'deriv', 'from', 'a', 'mandat', 'from', 'the', 'mass', ',', 'not', 'from', 'some', 'farcic', 'aquat', 'ceremoni', '.']

The other example using IndexedText doesn’t change since it’s not printing the direct output of the porter.stem().

Changes suggested on https://github.com/nltk/nltk_book/pull/216

Top Results From Across the Web

Stemming | Elasticsearch Guide [8.5] | Elastic

Algorithmic stemmers, which stem words based on a set of rules ... For example, a stemmer may reduce both skies and skiing to...

Data Pre-Processing: AI End-to-End Series (Part — 2.2 - NLP)

Lower -casing is a common text preprocessing technique. ... words to their word stem but differs in the way that it makes sure...

Python - Lemmatization Approaches with Examples

The following is a step by step guide to exploring various kinds of Lemmatization approaches in python along with a few examples and...

snowballstemmer - PyPI

If PyStemmer is installed, snowballstemmer.stemmer returns a PyStemmer Stemmer object which provides the same Stemmer.stemWord() and Stemmer.

2020 Colorado Academic Standards

Reading for All Purposes. 2020 Colorado Academic Standards. RW.P.2.2 ... Recognize and name all upper- and lowercase letters of the alphabet. (CCSS: RF....