question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

header training data - mismatching features columns?

See original GitHub issue

I’ve been testing the header model training data using features #76 and I’ve incurred in one small problem, it seems that the number of columns are not consistent.

For example: ‘anaesthesia’ on line 938 and ‘elsevier’, line 813 have different number of columns (30 vs 31):

ELSEVIER elsevier E EL ELS ELSE R ER IER VIER BLOCKSTART LINESTART NEWFONT HIGHERFONT 0 0 0 ALLCAP NODIGIT 0 0 0 0 0 0 0 0 0 0 NOPUNCT 0 0 I-<note>

Anaesthesia anaesthesia A An Ana Anae a ia sia esia BLOCKSTART LINESTART LINEINDENT NEWFONT HIGHERFONT 0 0 0 INITCAP NODIGIT 0 0 1 0 0 0 0 0 NOPUNCT 0 0 <reference>

I’m not sure is a bug (at least not in the current version - which is ignoring these information), and also I’m not sure this is the right place, but training the header model will fail with the automatic feature discovery enabled

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
lfoppianocommented, Mar 8, 2020

@kermitt2 thanks! I will check today.

1reaction
kermitt2commented, Jan 8, 2020

yes reference-segmenter model (not segmentation). No clue how some of these features arrived there! The features are combined with the header labels in a late stage, so the labels are not a good hint.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Classify structured data with feature columns | TensorFlow Core
Build an input pipeline to batch and shuffle the rows using tf.data. Map from columns in the CSV to features used to train...
Read more >
Table Header Detection and Classification - C. Lee Giles
We identify a set of features that can be used to segregate headers from tabular data and build a classifier to detect table...
Read more >
Handle also when header and data rows have different ...
Case 1: More column names than data columns read.table() has fill=TRUE to handle the case for when there are more column names than...
Read more >
Preparing your training data | AutoML Tables - Google Cloud
One column must be the target, and there must be at least one feature available to train the model. Ideally, your training data...
Read more >
How to Promote Column Headers with Tableau Prep Builder
Tableau Prep Builder has 2 easy approaches to promoting the correct column headings. One method involves the Data Interpreter feature, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found