question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

tabula.convert_into only converts 1st page of pdf

See original GitHub issue

Summary of your issue

Using tabula.convert_into on a multipage pdf only converts the 1st page of the pdf

Environment

Write and check your environment. Please paste outputs of specific commands if required.

  • Paste the output of python --version command on your terminal: ?

Python 2.7.10

  • Paste the java version “1.8.0_71”

Java™ SE Runtime Environment (build 1.8.0_71-b15) Java HotSpot™ 64-Bit Server VM (build 25.71-b15, mixed mode) of java -version command on your terminal: ?

  • Does java -h command work well?; Ensure your java command is included in PATH

yes

  • Write your OS and it’s version: ? macOS Sierra 10.12.3
  • (Optional, but really helpful) Your PDF URL:

http://alabcboard.gov/sites/default/files/inline-files/Store Phone List.pdf#

Example code:

def main():
    download_file("http://alabcboard.gov/sites/default/files/inline-files/Store%20Phone%20List.pdf")
    tabula.convert_into("document.pdf", "output.csv", output_format="csv")

def download_file(download_url):
    response = urllib2.urlopen(download_url)
    file = open("document.pdf", 'w')
    file.write(response.read())
    file.close()

if __name__ == "__main__":
    main()
#

## Output:
```cat output.csv```.  Only the first page is included.

## What did you intend to be?
All pages of the original pdf should be included.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
kumarapushcommented, Mar 21, 2019

pages=‘all’ - works fine.

0reactions
mertselimbcommented, Nov 18, 2019

Having the same problem and if i use pages=‘2’. I get a warning saying “The output file is empty.”. The pdf has 3 pages.

Read more comments on GitHub >

github_iconTop Results From Across the Web

tabula.io
Guess the portion of the page to analyze per page. ... Convert tables from PDFs in a directory. ... Note that read_pdf() only...
Read more >
tabula-py - Read the Docs
tabula -py is a simple Python wrapper of tabula-java, which can read table of PDF. You can read tables from PDF and convert...
Read more >
tabula 'pages' argument not specified, pages='all'
The issues I'm facing is that it's only extracting from one page, even though the pages argument is specified.
Read more >
Parse PDF Files While Retaining Structure with Tabula-py
I am making use of tabula.convert_into (python) to extract tables from PDF and place them into CSV. However i see that the method...
Read more >
Tabula : Scraping Table Data From PDF Files - Knoldus Blogs
Tabula is one of the useful packages which not only allows you to scrape tables from PDF files but also convert a PDF...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found