question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Columns getting merged

See original GitHub issue

Summary of your issue

I have a PDF with a table extending to multiple pages. For some rows, the value in last two (or second last two) columns is getting merged into a single one. Please note that I have removed some of the sensitive information. image

Environment

  • python --version: Python 2.7.6
  • java -version: java version “1.8.0_144”
  • OS and it’s version: Ubuntu 14.04
  • Your PDF URL: I have attached a screenshot of the table

What did you do when you faced the problem?

Tried lattice=True option but it is not even reading the table

Example code:

all_values = tabula.read_pdf("sample.pdf", pages='all', pandas_options={'header': None, 'error_bad_lines': False, 'warn_bad_lines': False})
all_values = all_values.values.tolist()

for val in all_values:
	print val

Output:

[u'22/06/17', u'IMPS-7-RAHUL-HDFC-XXXXXXXX', u'00007', u'22/06/17', nan, u'1,000.00 14,904.08', nan]
[nan, u'8-', nan, nan, nan, nan, nan]
[u'23/06/17', u'NEFT CHGS INCL ST & CESS 170617', u'000000000000000', u'23/06/17', u'2.88', u'14,901.20', nan]
[u'23/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'23/06/17', nan, u'579.00 15,480.20', nan]
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'9', nan, nan, nan, nan, nan]
[u'23/06/17', u'ACH D- AUFINANCIERS-9', u'0000002', u'23/06/17', u'14,400.00', u'1,080.20', nan]
[u'24/06/17', u'IMPS-7-ANI TECHNOLOGIES PRI-H', u'00007', u'24/06/17', nan, u'2,378.00 3,458.20', nan]
[nan, u'DFC-XXXXXXXXXXX-2017-06', nan, nan, nan, nan, nan]
[nan, u'-24-PAYMENT', nan, nan, nan, nan, nan]
[u'27/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'27/06/17', nan, u'445.00 3,903.20', nan]
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'8', nan, nan, nan, nan, nan]
[u'28/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'28/06/17', nan, u'2,279.00 6,182.20', nan]
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'0', nan, nan, nan, nan, nan]
[u'29/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'29/06/17', nan, u'1,165.00 7,347.20', nan]
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'5', nan, nan, nan, nan, nan]

What did you intend to be?

[u'22/06/17', u'IMPS-7-RAHUL-HDFC-XXXXXXXX', u'00007', u'22/06/17', nan, u'1,000.00', u'14,904.08']
[nan, u'8-', nan, nan, nan, nan, nan]
[u'23/06/17', u'NEFT CHGS INCL ST & CESS 170617', u'000000000000000', u'23/06/17', u'2.88', nan, u'14,901.20']
[u'23/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'23/06/17', nan, u'579.00', u'15,480.20']
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'9', nan, nan, nan, nan, nan]
[u'23/06/17', u'ACH D- AUFINANCIERS-9', u'0000002', u'23/06/17', u'14,400.00', nan, u'1,080.20']
[u'24/06/17', u'IMPS-7-ANI TECHNOLOGIES PRI-H', u'00007', u'24/06/17', nan, u'2,378.00', u'3,458.20']
[nan, u'DFC-XXXXXXXXXXX-2017-06', nan, nan, nan, nan, nan]
[nan, u'-24-PAYMENT', nan, nan, nan, nan, nan]
[u'27/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'27/06/17', nan, u'445.00', u'3,903.20']
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'8', nan, nan, nan, nan, nan]
[u'28/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'28/06/17', nan, u'2,279.00', u'6,182.20']
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'0', nan, nan, nan, nan, nan]
[u'29/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'29/06/17', nan, u'1,165.00', u'7,347.20']
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'5', nan, nan, nan, nan, nan]

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:15 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
peppapoppycommented, Jul 1, 2019

Columns containing only NaN get removed automatically in the read_pdf module, while I want them to be there. Is there any way to prevent this? sample I want 3 columns to be in order as they are, while I am getting this sample2

1reaction
samkit-jaincommented, Sep 23, 2017

Ok. I’ll raise an issue at tabula-java.

Received same output from stream=True

Read more comments on GitHub >

github_iconTop Results From Across the Web

Merge and unmerge cells - Microsoft Support
Select the cells to merge. Select Merge & Center. Important: When you merge multiple cells, the contents of only one cell (the upper-left...
Read more >
How to merge two columns in Excel without losing data
Select all cells from 2 or more columns that you want to merge, go to the Ablebits.com Data tab > Merge group, and...
Read more >
Adding Table Columns to Columns with Merged Cells
Another method is to separate the merged row from the rest of the table by splitting the table. You can then insert the...
Read more >
How To Merge Columns in Excel (With Step-by-Step ... - Indeed
How to use the "Merge" icon to merge columns · 1. Highlight the columns · 2. Open the home tab · 3. Select...
Read more >
Merge and Combine Columns without Losing Data in Excel
If you merge multiple columns of data in Excel (no matter which Excel version you are using), only the left column of data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found