question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

len(start_token) != 1 in parse_ace_event.py

See original GitHub issue

When preprocessing ACE 2005 dataset via parse_ace_event.py

d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(780)<module>() -> main() d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(776)main() -> include_pronouns=args.include_pronouns) d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(750)one_fold() -> js = document.to_json() d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(726)to_json() -> js = doc.to_json() d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(229)to_json() -> self.remove_whitespace() d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(216)remove_whitespace() -> entry.remove_whitespace() d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(160)remove_whitespace() -> self.align() d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(152)align() -> entity.align(self.sent) d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(43)align() -> self.span_sentence = get_token_indices(self, sent.as_doc()) d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(248)get_token_indices() -> debug_if(len(start_token) != 1)

As you seen here, len(start_token) != 1 .

The bug rose from here:

def get_token_indices(entity, sent):
     start_token = [tok for tok in sent if tok.idx == entity.start_char] 

And the reason is when feeding sent.as_doc() into get_token_indices(), the token.idx indexes from the beginning of this sentence while entity.start_char is count from start of the whole document. So start_token=[]

Could you please fix this bug? Or is there anything I did wrong?

Best regards.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
DevoAllencommented, Aug 31, 2021

you should use the spacy==2.0.18

0reactions
linmoucommented, Aug 19, 2022

It seems that this bug came from the mistaken sparsing of sentences. The recommended way is installing the same spacy version as this repo. The other way is to rectify the training articles, delete the special tokens like ‘–’, which causes the wrong spliting of sentences and gives rise to the bug. All in all, it is a tortuous process.

Read more comments on GitHub >

github_iconTop Results From Across the Web

len() and index count starts with in 1 and 0? true or false?
Yes ! len() count starts with 1 Try this code: print len(“Brian”) Output: 5. Index count stars with 0 Try this code: var...
Read more >
Using the len() Function in Python
The function len() is one of Python's built-in functions. It returns the length of an object. For example, it can return the number...
Read more >
Learn the Python len() Function - Level Up Coding
The len() Python function counts the number of items in an object. The object can be a string, tuple, dictionary, list, sets, array,...
Read more >
romannumeral/tokenise.py at master - GitHub
Now convert any ascending pairs into tokens. output = []. idx = 0. while idx < len(input_str)-1: if order[input_str[idx]] < order[input_str[idx+1]]:.
Read more >
python - What does "len(A) - 1" mean? - Stack Overflow
len (A)-1 actually is the index of the last element in list A . As in python (and almost all programming languages), array...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found