Columns getting merged
See original GitHub issueSummary of your issue
I have a PDF with a table extending to multiple pages. For some rows, the value in last two (or second last two) columns is getting merged into a single one. Please note that I have removed some of the sensitive information.
Environment
-
python --version
: Python 2.7.6 -
java -version
: java version “1.8.0_144” - OS and it’s version: Ubuntu 14.04
- Your PDF URL: I have attached a screenshot of the table
What did you do when you faced the problem?
Tried lattice=True
option but it is not even reading the table
Example code:
all_values = tabula.read_pdf("sample.pdf", pages='all', pandas_options={'header': None, 'error_bad_lines': False, 'warn_bad_lines': False})
all_values = all_values.values.tolist()
for val in all_values:
print val
Output:
[u'22/06/17', u'IMPS-7-RAHUL-HDFC-XXXXXXXX', u'00007', u'22/06/17', nan, u'1,000.00 14,904.08', nan]
[nan, u'8-', nan, nan, nan, nan, nan]
[u'23/06/17', u'NEFT CHGS INCL ST & CESS 170617', u'000000000000000', u'23/06/17', u'2.88', u'14,901.20', nan]
[u'23/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'23/06/17', nan, u'579.00 15,480.20', nan]
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'9', nan, nan, nan, nan, nan]
[u'23/06/17', u'ACH D- AUFINANCIERS-9', u'0000002', u'23/06/17', u'14,400.00', u'1,080.20', nan]
[u'24/06/17', u'IMPS-7-ANI TECHNOLOGIES PRI-H', u'00007', u'24/06/17', nan, u'2,378.00 3,458.20', nan]
[nan, u'DFC-XXXXXXXXXXX-2017-06', nan, nan, nan, nan, nan]
[nan, u'-24-PAYMENT', nan, nan, nan, nan, nan]
[u'27/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'27/06/17', nan, u'445.00 3,903.20', nan]
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'8', nan, nan, nan, nan, nan]
[u'28/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'28/06/17', nan, u'2,279.00 6,182.20', nan]
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'0', nan, nan, nan, nan, nan]
[u'29/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'29/06/17', nan, u'1,165.00 7,347.20', nan]
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'5', nan, nan, nan, nan, nan]
What did you intend to be?
[u'22/06/17', u'IMPS-7-RAHUL-HDFC-XXXXXXXX', u'00007', u'22/06/17', nan, u'1,000.00', u'14,904.08']
[nan, u'8-', nan, nan, nan, nan, nan]
[u'23/06/17', u'NEFT CHGS INCL ST & CESS 170617', u'000000000000000', u'23/06/17', u'2.88', nan, u'14,901.20']
[u'23/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'23/06/17', nan, u'579.00', u'15,480.20']
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'9', nan, nan, nan, nan, nan]
[u'23/06/17', u'ACH D- AUFINANCIERS-9', u'0000002', u'23/06/17', u'14,400.00', nan, u'1,080.20']
[u'24/06/17', u'IMPS-7-ANI TECHNOLOGIES PRI-H', u'00007', u'24/06/17', nan, u'2,378.00', u'3,458.20']
[nan, u'DFC-XXXXXXXXXXX-2017-06', nan, nan, nan, nan, nan]
[nan, u'-24-PAYMENT', nan, nan, nan, nan, nan]
[u'27/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'27/06/17', nan, u'445.00', u'3,903.20']
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'8', nan, nan, nan, nan, nan]
[u'28/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'28/06/17', nan, u'2,279.00', u'6,182.20']
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'0', nan, nan, nan, nan, nan]
[u'29/06/17', u'NEFT CR-YESB0000001-ANI TECHNOLOGIES PRI', u'N1', u'29/06/17', nan, u'1,165.00', u'7,347.20']
[nan, u'VATE LIMITED-RAHUL YADAV-N1', nan, nan, nan, nan, nan]
[nan, u'5', nan, nan, nan, nan, nan]
Issue Analytics
- State:
- Created 6 years ago
- Comments:15 (3 by maintainers)
Top Results From Across the Web
Merge and unmerge cells - Microsoft Support
Select the cells to merge. Select Merge & Center. Important: When you merge multiple cells, the contents of only one cell (the upper-left...
Read more >How to merge two columns in Excel without losing data
Select all cells from 2 or more columns that you want to merge, go to the Ablebits.com Data tab > Merge group, and...
Read more >Adding Table Columns to Columns with Merged Cells
Another method is to separate the merged row from the rest of the table by splitting the table. You can then insert the...
Read more >How To Merge Columns in Excel (With Step-by-Step ... - Indeed
How to use the "Merge" icon to merge columns · 1. Highlight the columns · 2. Open the home tab · 3. Select...
Read more >Merge and Combine Columns without Losing Data in Excel
If you merge multiple columns of data in Excel (no matter which Excel version you are using), only the left column of data...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Columns containing only NaN get removed automatically in the read_pdf module, while I want them to be there. Is there any way to prevent this?
I want 3 columns to be in order as they are, while I am getting this
![sample2](https://user-images.githubusercontent.com/48446853/60423770-bdee5b00-9ba3-11e9-8db9-57660ded9edf.PNG)
Ok. I’ll raise an issue at tabula-java.
Received same output from
stream=True