question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

when attempting to retrieve sentences, I get tokens

See original GitHub issue

I followed the tutorial example from here:

en_nlp = English()
en_doc = en_nlp(u'Hello, world. Here are two sentences.')
next(en_doc.sents)

But it returns only Hello.

So I ran the tests found here, and appears that a few of those tests fail for the same reason:

python -m pytest /Users/mcapizzi/miniconda3/envs/nextiva-pipeline/lib/python3.5/site-packages/spacy-1.0.4-py3.5-macosx-10.6-x86_64.egg/spacy/tests/spans/test_span.py --models
============================= test session starts ==============================
platform darwin -- Python 3.5.2, pytest-3.0.3, py-1.4.31, pluggy-0.4.0
rootdir: /Users/mcapizzi, inifile: 
collected 4 items 

../../miniconda3/envs/nextiva-pipeline/lib/python3.5/site-packages/spacy-1.0.4-py3.5-macosx-10.6-x86_64.egg/spacy/tests/spans/test_span.py FF..

=================================== FAILURES ===================================
_______________________________ test_sent_spans ________________________________

doc = This is a sentence. This is another sentence. And a third.

    @pytest.mark.models
    def test_sent_spans(doc):
        sents = list(doc.sents)
        assert sents[0].start == 0
>       assert sents[0].end == 5
E       assert 1 == 5
E        +  where 1 = This.end

../../miniconda3/envs/nextiva-pipeline/lib/python3.5/site-packages/spacy-1.0.4-py3.5-macosx-10.6-x86_64.egg/spacy/tests/spans/test_span.py:18: AssertionError
__________________________________ test_root ___________________________________

doc = This is a sentence. This is another sentence. And a third.

    @pytest.mark.models
    def test_root(doc):
        np = doc[2:4]
        assert len(np) == 2
        assert np.orth_ == 'a sentence'
>       assert np.root.orth_ == 'sentence'
E       assert 'a' == 'sentence'
E         - a
E         + sentence

../../miniconda3/envs/nextiva-pipeline/lib/python3.5/site-packages/spacy-1.0.4-py3.5-macosx-10.6-x86_64.egg/spacy/tests/spans/test_span.py:28: AssertionError
====================== 2 failed, 2 passed in 0.10 seconds ======================

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
michaelcapizzicommented, Oct 20, 2016

Got it! Thanks for your quick response, @honnibal .

I love the new additions to spaCy, and the small company I work for is making it the foundation of our NLP pipeline. 😃

0reactions
lock[bot]commented, May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

NLP | How tokenizing text, sentence, words works
One can think of token as parts like a word is a token in a sentence, and a sentence is a token in...
Read more >
Tokenization - Stanford NLP Group
A token is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit...
Read more >
Spacy divides sentences inconsistently - python - Stack Overflow
The problem I have is that some sentences are split and Benepar parses them as independent chunks (instead of the whole sentence together)...
Read more >
What Are Refresh Tokens and How to Use Them Securely
This post will explore the concept of refresh tokens as defined by OAuth 2.0. We will learn how they compare to other token...
Read more >
Access Token Response - OAuth 2.0 Simplified
When responding with an access token, the server must also include the additional Cache-Control: no-store HTTP header to ensure clients do not ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found