tabula.convert_into only converts 1st page of pdf
See original GitHub issueSummary of your issue
Using tabula.convert_into on a multipage pdf only converts the 1st page of the pdf
Environment
Write and check your environment. Please paste outputs of specific commands if required.
- Paste the output of
python --version
command on your terminal: ?
Python 2.7.10
- Paste the java version “1.8.0_71”
Java™ SE Runtime Environment (build 1.8.0_71-b15) Java HotSpot™ 64-Bit Server VM (build 25.71-b15, mixed mode) of
java -version
command on your terminal: ?
- Does
java -h
command work well?; Ensure your java command is included inPATH
yes
- Write your OS and it’s version: ? macOS Sierra 10.12.3
- (Optional, but really helpful) Your PDF URL:
http://alabcboard.gov/sites/default/files/inline-files/Store Phone List.pdf#
Example code:
def main():
download_file("http://alabcboard.gov/sites/default/files/inline-files/Store%20Phone%20List.pdf")
tabula.convert_into("document.pdf", "output.csv", output_format="csv")
def download_file(download_url):
response = urllib2.urlopen(download_url)
file = open("document.pdf", 'w')
file.write(response.read())
file.close()
if __name__ == "__main__":
main()
#
## Output:
```cat output.csv```. Only the first page is included.
## What did you intend to be?
All pages of the original pdf should be included.
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
tabula.io
Guess the portion of the page to analyze per page. ... Convert tables from PDFs in a directory. ... Note that read_pdf() only...
Read more >tabula-py - Read the Docs
tabula -py is a simple Python wrapper of tabula-java, which can read table of PDF. You can read tables from PDF and convert...
Read more >tabula 'pages' argument not specified, pages='all'
The issues I'm facing is that it's only extracting from one page, even though the pages argument is specified.
Read more >Parse PDF Files While Retaining Structure with Tabula-py
I am making use of tabula.convert_into (python) to extract tables from PDF and place them into CSV. However i see that the method...
Read more >Tabula : Scraping Table Data From PDF Files - Knoldus Blogs
Tabula is one of the useful packages which not only allows you to scrape tables from PDF files but also convert a PDF...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
pages=‘all’ - works fine.
Having the same problem and if i use pages=‘2’. I get a warning saying “The output file is empty.”. The pdf has 3 pages.