question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multilines cells not well detected with guess mode

See original GitHub issue

Hello,

I’m trying to use your fantastic piece of software to analyse the table provided in page 1 of this pdf : input.pdf

I’m using the last version of tabula-java. I compiled it thought git and mvn (as specified in readme.md).

I execute this command line : java -jar target/tabula-1.0.2-SNAPSHOT-jar-with-dependencies.jar -g input.pdf > output.csv

I’ve got the following output : output.txt

But, I expect this : output_expected.txt

Do you think is that possible ?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jazzidocommented, Mar 9, 2018

Tabula itself! Get it here: http://tabula.technology — The script export format contains the area of interest:

image

0reactions
My3VMcommented, Apr 11, 2018

Hello there, I’ve come across similar scenario to detect the tables followed by MultiLine cells. I’ve tried the above idea, though it works very well using the tool (on Mac OSX), I’m unable to reproduce the exact results using tabula.jar.

Tried the one freshly built from source as well as the one that came with Tabula.app. (Tabula.app/Contents/Java). In both these cases the observation is consistent, that it misses the last row.,while rest of the results do come in well.

Tried to achieve similar workflow by tweaking code to handle both -g -l option which first guesses the rect and then passes the rectangles with SpreadsheetDetection algorithm, but the observation is still the same…,last rows get skipped.

Please suggest how could I augment both guess option and lattice way of extraction.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Line spacing and alignment issues in tables with multiline cells
Lines 5 and 6 (without any multiline cells) have the correct distance from each other. A tabular within the table (cells 221 and...
Read more >
delimited files opened in excel with multi lines per cell display ...
my files are being generated in javascript, and i tried inserting different line breaks with no success. here is how the file opens...
Read more >
How to put multiple lines of text in one cell in Excel?
1. Right click the cell you want to put multiple lines, then click Format Cells. See screenshot: 2. In the Format Cells dialog...
Read more >
Multiline text in Excel cells - java - Stack Overflow
If I press "Format" in excel before double-clicking the cell, "word wrap" property is not checked. After double-click text is split in 2...
Read more >
How to copy multi-line text from Excel without quotes?
Easiest way that I've found is to concatenate the cells that you want to be on multiple lines with something "special" between them,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found