question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Phrasematcher; OverflowError: Python int too large to convert to C long

See original GitHub issue

How to reproduce the behaviour

Greetings;

I am trying to use Spacy PhraseMatcher with token attr 'IS_SENT_START'. I guess like other attributes like ‘IS_PUNCT’ it should be possible to use the is_sent_start attr yet I got the following error.

`matcher = PhraseMatcher(nlp.vocab, attr = "IS_SENT_START")
Traceback (most recent call last):

  File "<ipython-input-74-c8e7d50d6aec>", line 1, in <module>
    matcher = PhraseMatcher(nlp.vocab, attr = "IS_SENT_START")

  File "phrasematcher.pyx", line 63, in spacy.matcher.phrasematcher.PhraseMatcher.__init__

OverflowError: Python int too large to convert to C long`

Here is the sample code;

import spacy
nlp = spacy.load('en_core_web_sm')
from spacy.matcher import PhraseMatcher
sub_list = ['however', 'although', 'on the other hand', 'nonetheless']
patterns = [nlp.make_doc(text) for text in sub_list]
matcher = PhraseMatcher(nlp.vocab, attr = 'IS_SENT_START')

Why so? I am trying to see if phrases in list are sent_start = True or False. No problem with Matcher yet it is not working with Phrasematcher. Can you help me with the solution or a work around?

Your Environment

  • Operating System: macOS Catalina
  • Python Version Used: 3.7.9
  • spaCy Version Used: 2.0
  • Environment Information: Anaconda-Spyder

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
svlandegcommented, Jan 20, 2021

Hi!

The error the matcher throws should definitely be more user-friendly, and the documentation should be updated too.

But you can make this work with attr='SENT_START'. Note that this will make the PhraseMatcher look at sentence segmentation, and so you’ll have to compile your patterns with nlp() instead of nlp.make_doc(text) to ensure that the parser runs (which does the sentence segmentation in en_core_web_sm)

This should work:

import spacy
from spacy.matcher import PhraseMatcher

nlp = spacy.load('en_core_web_sm')
sub_list = ['This is nice. Extraordinary.']
patterns = [nlp(text) for text in sub_list]    

The pattern will now be [True, None, None, None, True, None]

matcher = PhraseMatcher(nlp.vocab, attr='SENT_START')
matcher.add("TerminologyList", None, *patterns)
text = "It is awesome. Great. Just great. What I think? Awesome."
doc = nlp(text)
matches = matcher(doc)
for match_id, start, end in matches:
    span = doc[start:end]
    print(start, end, "-->", span.text)

This prints

0 6 --> It is awesome. Great. 
9 15 --> What I think? Awesome.
0reactions
github-actions[bot]commented, Oct 27, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python int too large to convert to C long" on windows but not ...
OverflowError : Python int too large to convert to C long. Now try with float conversion: df['temp'] = df['temp'].astype(float).
Read more >
OverflowError: Python int too large to convert to C long
Hello, As the label names it, I am having issues w/ running some i2c-2 source on an am335x based, SiP board. I deal...
Read more >
OverflowError: Python int too large to convert to C long
Hi, I was able to setup my raspberry pi using Method 1 mentioned here. Below is my simple code. Just trying to create...
Read more >
Python int too large to convert to C long in python 3.4
I am getting this error when I am trying to run word2vec from gensim library of python. I am using python 3.4 and...
Read more >
[Python] OverflowError: Python int too large to convert to C long
[Python] OverflowError: Python int too large to convert to C long. Hello guys! I'm trying to run a program I built but being...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found