HTTP Error 400 Bad Request when reading from AWS S3 PreSigned URL
See original GitHub issueSummary of your issue
I am trying to read a pdf from AWS S3 PreSigned URL and I experiencing the following error:
Traceback (most recent call last):
File "main.py", line 3, in <module>
df = tabula.read_pdf("https://my-bucket-name.s3.amazonaws.com/v1/user/62e970a844d091d90069ab7d/file/62ed00ce74b8a716407975d6?SIGNED_HASH")[0]
File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/io.py", line 311, in read_pdf
path, temporary = localize_file(input_path, user_agent)
File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/file_util.py", line 48, in localize_file
req = urlopen(path_or_buffer)
File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
Check list before submit
-
Did you read FAQ?
-
(Optional, but really helpful) Your PDF URL: it’s a AWS S3 PreSigned URL, sorry, can’t share, but the PreSigned is working because I am able to access the PDF through the browser
-
Paste the output of
import tabula; tabula.environment_info()
on Python REPL:
Python version:
3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0]
Java version:
openjdk version "1.8.0_342"
OpenJDK Runtime Environment (build 1.8.0_342-8u342-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)
tabula-py version: 2.4.0
platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.29
uname:
uname_result(system='Linux', node='DESKTOP-KVLP8PC', release='5.10.16.3-microsoft-standard-WSL2', version='#1 SMP Fri Apr 2 22:23:49 UTC 2021', machine='x86_64', processor='x86_64')
linux_distribution: ('Ubuntu', '20.04', 'focal')
mac_ver: ('', ('', '', ''), '')
None
- Paste the output of
python --version
command on your terminal: Python 3.8.10 - Paste the output of
java -version
command on your terminal:
openjdk version "1.8.0_342"
OpenJDK Runtime Environment (build 1.8.0_342-8u342-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)
- Does
java -h
command work well?; Ensure your java command is included inPATH
- Write your OS and it’s version: Windows 10 with WSL
What did you do when you faced the problem?
Tried to search on both Google and GitHub issues to see if anyone else is facing the same issue and found nothing.
Code:
df = tabula.read_pdf("https://my-bucket-name.s3.amazonaws.com/v1/user/62e970a844d091d90069ab7d/file/62ed00ce74b8a716407975d6?SIGNED_HASH")[0]
df.to_csv('./test.csv', encoding='utf-8')
print(df)
Expected behavior:
CSV with PDF data.
Actual behavior:
Traceback (most recent call last):
File "main.py", line 3, in <module>
df = tabula.read_pdf("https://my-bucket-name.s3.amazonaws.com/v1/user/62e970a844d091d90069ab7d/file/62ed00ce74b8a716407975d6?SIGNED_HASH")[0]
File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/io.py", line 311, in read_pdf
path, temporary = localize_file(input_path, user_agent)
File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/file_util.py", line 48, in localize_file
req = urlopen(path_or_buffer)
File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
Related Issues:
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
400 Bad Request from PUT to S3 using pre-signed URL
I am trying to upload a video file from JS to an S3 bucket but am getting a 400 bad request right now....
Read more >HTTP 400 status code (Bad Request) - Amazon CloudFront
To fix this error, update your CloudFront distribution so that it finds the S3 bucket in the bucket's current AWS Region. To update...
Read more >Single-part upload using pre-signed url failed with response ...
Attempt to upload a small file to a pre-signed failed with http response status 400. Expected Behavior. Uploading a small file (<5 MB, ......
Read more >400 Bad Request Uploading To Amazon S3 With Signed ...
All GET and PUT requests for an object protected by AWS KMS will fail if not made via SSL or using SigV4. For...
Read more >getting 400 Bad Request when trying to upload to aws s3 ...
I faced the same issue and after searching for hours, I was able to solve it by adding the region of my bucket...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Now it is working perfectly, thanks a lot for your help!!
@jgcmarins Just confirming, did you reinstall tabula-py with the latest master branch? I haven’t released it to PyPI yet.
I doubt you are using the same version of tabula-py since the stack trace shows the error of
line 48
forlocalize_file
but with the latest master branch, it should be line 59 https://github.com/chezou/tabula-py/blob/5dac2087db18022f56af081840bf10b413971708/tabula/file_util.py#L59File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/file_util.py", line 48, in localize_file
Suffix should not be the problem since tabula-py automatically adds it.