Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

header training data - mismatching features columns?

See original GitHub issue

I’ve been testing the header model training data using features #76 and I’ve incurred in one small problem, it seems that the number of columns are not consistent.

For example: ‘anaesthesia’ on line 938 and ‘elsevier’, line 813 have different number of columns (30 vs 31):

ELSEVIER elsevier E EL ELS ELSE R ER IER VIER BLOCKSTART LINESTART NEWFONT HIGHERFONT 0 0 0 ALLCAP NODIGIT 0 0 0 0 0 0 0 0 0 0 NOPUNCT 0 0 I-<note>

Anaesthesia anaesthesia A An Ana Anae a ia sia esia BLOCKSTART LINESTART LINEINDENT NEWFONT HIGHERFONT 0 0 0 INITCAP NODIGIT 0 0 1 0 0 0 0 0 NOPUNCT 0 0 <reference>

I’m not sure is a bug (at least not in the current version - which is ignoring these information), and also I’m not sure this is the right place, but training the header model will fail with the automatic feature discovery enabled

Issue Analytics

State:
Created 4 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

lfoppianocommented, Mar 8, 2020

@kermitt2 thanks! I will check today.

1reaction

kermitt2commented, Jan 8, 2020

yes reference-segmenter model (not segmentation). No clue how some of these features arrived there! The features are combined with the header labels in a late stage, so the labels are not a good hint.