Danish noun_chunk is always empty
See original GitHub issueHow to reproduce the behavior
As seen in the following minimal example the noun_chunks is always empty for Danish. This seems like a potential bug. The dataset used here contains >4000 samples.
import datasets # huggingface's datasets
import spacy
# load sample danish dataset
ds = datasets.load_dataset("dane", split="train")
nlp = spacy.load('da_core_news_lg')
docs = nlp.pipe(ds["text"])
max_length_noun_chunks = max([len(list(doc.noun_chunks)) for doc in docs])
print(max_length_noun_chunks)
0
Your Environment
- spaCy version: 2.3.2
- Platform: macOS-10.15.7-x86_64-i386-64bit
- Python version: 3.8.5
- Models: en
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
DKPro Core™ Tagset Reference
Danish Dependency Treebank ... prepositional chunk (usually embeds a noun chunk ... Clause introduced by a (possibly empty) subordinating conjunction.
Read more >spacy Changelog - pyup.io
Fix issue 10324: Fix `Tok2Vec` for empty batches. ... Danish transformer pipeline (Maltehb/danish-bert-botxo). ... Add noun chunk iterator for Danish.
Read more >Towards Question-Answering as an Automatic Metric for ...
A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how ...
Read more >"Matxin", an open-source rule-based machine translation ...
When the parent of the prepositional chunk is a noun chunk, we cannot use any verb subcategorization information and so the translation of...
Read more >EMPTY - Translation in Danish - bab.la
Translation for 'empty' in the free English-Danish dictionary and many other ... But they can always cling to hope; they can cling to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I agree that a
NotImplementedError
would make more sense!Hi! The noun chunker for Danish has been implemented very recently, and will be available in the next release. Currently,
doc.noun_chunks
returns an empty list when it’s not (yet) implemented, which is kind of confusing.