question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lemmatization of contraction "we're/she's" does not yield pronoun + "be"

See original GitHub issue

I’m currently parsing a lot of speech transcripts, encountering a lot of pronoun contractions with subject + be. While I'm will be lemmatized into I be, this does not happen for contractions like you're or he's, the lemmas for the latter will actually be he and '.

Is there a reason for treating contractions this way?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
honnibalcommented, Apr 7, 2016

Okay, thanks. I should’ve looked more closely. There were indeed several remaining inconsistencies in the script that generates the specials.json file. I’m surprised this didn’t get spotted sooner!

I still need to add some tests for this, but the issue should be fixed in the next data release.

0reactions
lock[bot]commented, May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Stemming and lemmatization - Stanford NLP Group
The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common...
Read more >
Lemmatization Approaches with Examples in Python
Lemmatization is the process of converting a word to its base form. ... Because, 'are' is not converted to 'be' and 'hanging' is...
Read more >
Spacy - lemmatization on pronouns gives some erronous output
Unlike verbs and common nouns, there's no clear base form of a personal pronoun. Should the lemma of "me" be "I", or should...
Read more >
How to do stemming and lemmatization? / WinkJS - Observable
A typical example is to drop the "-ing" or "-ed" suffix to derive the stem. It may not always yield a proper dictionary...
Read more >
simplemma - PyPI
A simple multilingual lemmatizer for Python. ... In particular, it does not need morphosyntactic information and can process a raw series of tokens...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found