question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Danish noun_chunk is always empty

See original GitHub issue

How to reproduce the behavior

As seen in the following minimal example the noun_chunks is always empty for Danish. This seems like a potential bug. The dataset used here contains >4000 samples.

import datasets  # huggingface's datasets

import spacy

# load sample danish dataset
ds = datasets.load_dataset("dane", split="train")

nlp = spacy.load('da_core_news_lg')
docs = nlp.pipe(ds["text"])

max_length_noun_chunks = max([len(list(doc.noun_chunks)) for doc in docs])
print(max_length_noun_chunks)
0

Your Environment

  • spaCy version: 2.3.2
  • Platform: macOS-10.15.7-x86_64-i386-64bit
  • Python version: 3.8.5
  • Models: en

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
svlandegcommented, Jan 11, 2021

I agree that a NotImplementedError would make more sense!

1reaction
svlandegcommented, Jan 11, 2021

Hi! The noun chunker for Danish has been implemented very recently, and will be available in the next release. Currently, doc.noun_chunks returns an empty list when it’s not (yet) implemented, which is kind of confusing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DKPro Core™ Tagset Reference
Danish Dependency Treebank ... prepositional chunk (usually embeds a noun chunk ... Clause introduced by a (possibly empty) subordinating conjunction.
Read more >
spacy Changelog - pyup.io
Fix issue 10324: Fix `Tok2Vec` for empty batches. ... Danish transformer pipeline (Maltehb/danish-bert-botxo). ... Add noun chunk iterator for Danish.
Read more >
Towards Question-Answering as an Automatic Metric for ...
A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how ...
Read more >
"Matxin", an open-source rule-based machine translation ...
When the parent of the prepositional chunk is a noun chunk, we cannot use any verb subcategorization information and so the translation of...
Read more >
EMPTY - Translation in Danish - bab.la
Translation for 'empty' in the free English-Danish dictionary and many other ... But they can always cling to hope; they can cling to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found