two single row tables in two separate pdfs don't bet read by camelot as tables
See original GitHub issueWindows-10-10.0.19043-SP0 Python 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)] NumPy 1.21.2 OpenCV 4.5.3 Camelot 0.10.1
Describe the bug 2 of 55 pdfs with K-12 education table data have one row tables that don’t process as tables. https://www.dpi.nc.gov/media/8350/open https://www.dpi.nc.gov/media/8325/open
It doesn’t find a table, likely related to one row entry.
Steps to reproduce the bug ran: tables = camelot.read_pdf(weburl, pages=‘all’) where weburl is set to the above two urls in a loop.
Expected behavior
Should have one row table output for these two separate 1 page pdfs.
Code
tables = camelot.read_pdf(weburl, pages=‘all’)
import camelot
# add your code here
https://www.dpi.nc.gov/media/8350/open https://www.dpi.nc.gov/media/8325/open
Screenshots
Environment
- OS: [e.g. macOS]
- Python version:
- Numpy version:
- OpenCV version:
- Ghostscript version:
- Camelot version:
Windows-10-10.0.19043-SP0 Python 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)] NumPy 1.21.2 OpenCV 4.5.3 Camelot 0.10.1
Additional context
there are 55 educator prep urls being cycled through in a loop. These two failed to produce tables, and I bet it’s related to only have one row entry after headers or something.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Yeah it wasn’t returning a table on those two examples, I thought it may have been due to one line tables. It was odd that out of 55 pdfs with similar table formatting, the only two that failed to return tables were the one data row tables in those 2 pdfs.
I’ll try the suggestions you provided and see if that works when I can. Thank you.
On Fri, Oct 1, 2021 at 7:29 PM Tiago Samaha Cordeiro < @.***> wrote:
– Doug Taggart
Sorry, I haven’t had time to try the fix on this project yet. Had to back burn it for a bit.
On Tue, Nov 9, 2021 at 10:53 AM Tiago Samaha Cordeiro < @.***> wrote:
– Doug Taggart