question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

stem_word and lower case stem output in v3.2.2

See original GitHub issue

Hi,

Just wanted to confirm the following things affected by the recent upgrade to v3.2.2:

  • There is no stem_word function in PorterStemmer(). I had to replace it with stem()
  • Unlike before, stem returns lower case of a word e.g. stem for Stemming (http://text-processing.com/demo/stem/ is the same as before).

Cheers, Ehsan

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
alvationscommented, Sep 14, 2018

Closing issue, https://github.com/nltk/nltk_book/pull/216 is merged =)

Thanks @iranianpep @jayvdb @RodgerKibble for raising the issue! Thanks @ExplodingCabbage for verifying the changes!

0reactions
alvationscommented, Aug 28, 2018

Validating the outputs from Chapter 3.6 of the book.

>>> from nltk import word_tokenize
>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()

>>> raw = """DENNIS: Listen, strange women lying in ponds distributing swords
... is no basis for a system of government.  Supreme executive power derives from
... a mandate from the masses, not from some farcical aquatic ceremony."""

>>> tokens = word_tokenize(raw)

>>> [porter.stem(t) for t in tokens]
['denni', ':', 'listen', ',', 'strang', 'women', 'lie', 'in', 'pond', 'distribut', 'sword', 'is', 'no', 'basi', 'for', 'a', 'system', 'of', 'govern', '.', 'suprem', 'execut', 'power', 'deriv', 'from', 'a', 'mandat', 'from', 'the', 'mass', ',', 'not', 'from', 'some', 'farcic', 'aquat', 'ceremoni', '.']

The other example using IndexedText doesn’t change since it’s not printing the direct output of the porter.stem().

Changes suggested on https://github.com/nltk/nltk_book/pull/216

Read more comments on GitHub >

github_iconTop Results From Across the Web

Stemming | Elasticsearch Guide [8.5] | Elastic
Algorithmic stemmers, which stem words based on a set of rules ... For example, a stemmer may reduce both skies and skiing to...
Read more >
Data Pre-Processing: AI End-to-End Series (Part — 2.2 - NLP)
Lower -casing is a common text preprocessing technique. ... words to their word stem but differs in the way that it makes sure...
Read more >
Python - Lemmatization Approaches with Examples
The following is a step by step guide to exploring various kinds of Lemmatization approaches in python along with a few examples and...
Read more >
snowballstemmer - PyPI
If PyStemmer is installed, snowballstemmer.stemmer returns a PyStemmer Stemmer object which provides the same Stemmer.stemWord() and Stemmer.
Read more >
2020 Colorado Academic Standards
Reading for All Purposes. 2020 Colorado Academic Standards. RW.P.2.2 ... Recognize and name all upper- and lowercase letters of the alphabet. (CCSS: RF....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found