A special case that requires filter over TokenList id
See original GitHub issueFor the following example,
# sent_id = reviews-002288-0001
# newpar id = reviews-002288-p0001
# text = It's well cool. :)
1-2 It's _ _ _ _ _ _ _ _
1 It it PRON PRP Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs 4 nsubj 4:nsubj _
2 's be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 cop 4:cop _
3 well well ADV RB Degree=Pos 4 advmod 4:advmod _
4 cool cool ADJ JJ Degree=Pos 0 root 0:root SpaceAfter=No
5 . . PUNCT . _ 4 punct 4:punct _
6 :) :) SYM NFP _ 4 discourse 4:discourse _
The parsed result will be
TokenList<It's, It, 's, well, cool, ., :)>
[{'id': (1, '-', 2),
'form': "It's",
'lemma': '_',
'upos': '_',
'xpos': None,
'feats': None,
'head': None,
'deprel': '_',
'deps': None,
'misc': None},
{'id': 1,
'form': 'It',
'lemma': 'it',
'upos': 'PRON',
'xpos': 'PRP',
'feats': {'Case': 'Nom',
'Gender': 'Neut',
'Number': 'Sing',
'Person': '3',
'PronType': 'Prs'},
'head': 4,
'deprel': 'nsubj',
'deps': [('nsubj', 4)],
'misc': None},
]
The parsing has no problem. However, in some cases, one may want to use those id
that is integer.
How about make TokenList
to support filter those with only integer id?
Like this
sentence.filter(id=lambda x: type(x) is int)
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
transformers/tokenization_t5.py at main · huggingface ...
Unless required by applicable law or agreed to in writing, software ... Retrieve sequence ids from a token list that has no special...
Read more >conllu
Use parse() to parse into a list of sentences. > · New in conllu 2.0: filter() a TokenList. > · Writing data back...
Read more >Rule-based matching · spaCy Usage Documentation
Each expression you provide will be matched on a token. If you need to match on the whole text instead, see the details...
Read more >Tokenization of Character Variables — step_tokenize • ...
step_tokenize() creates a specification of a recipe step that will convert a character predictor into a token variable.
Read more >textrecipes: Extra 'Recipes' for Text Processing
Description Converting text to numerical features requires ... id. A character string that is unique to this step to identify it. Details.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I grouped them into
lambda_basic
andlambda_deep
, the test cases are also simplified.I added a brief section about filtering by lambda in the README, and just released conllu==4.3 that supports this. Thanks for your contribution!