question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTP Error 400 Bad Request when reading from AWS S3 PreSigned URL

See original GitHub issue

Summary of your issue

I am trying to read a pdf from AWS S3 PreSigned URL and I experiencing the following error:

Traceback (most recent call last):
  File "main.py", line 3, in <module>
    df = tabula.read_pdf("https://my-bucket-name.s3.amazonaws.com/v1/user/62e970a844d091d90069ab7d/file/62ed00ce74b8a716407975d6?SIGNED_HASH")[0]
  File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/io.py", line 311, in read_pdf
    path, temporary = localize_file(input_path, user_agent)
  File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/file_util.py", line 48, in localize_file
    req = urlopen(path_or_buffer)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

Check list before submit

  • Did you read FAQ?

  • (Optional, but really helpful) Your PDF URL: it’s a AWS S3 PreSigned URL, sorry, can’t share, but the PreSigned is working because I am able to access the PDF through the browser

  • Paste the output of import tabula; tabula.environment_info() on Python REPL:

Python version:
    3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0]
Java version:
    openjdk version "1.8.0_342"
OpenJDK Runtime Environment (build 1.8.0_342-8u342-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)
tabula-py version: 2.4.0
platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.29
uname:
    uname_result(system='Linux', node='DESKTOP-KVLP8PC', release='5.10.16.3-microsoft-standard-WSL2', version='#1 SMP Fri Apr 2 22:23:49 UTC 2021', machine='x86_64', processor='x86_64')
linux_distribution: ('Ubuntu', '20.04', 'focal')
mac_ver: ('', ('', '', ''), '')
None
  • Paste the output of python --version command on your terminal: Python 3.8.10
  • Paste the output of java -version command on your terminal:
openjdk version "1.8.0_342"
OpenJDK Runtime Environment (build 1.8.0_342-8u342-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)
  • Does java -h command work well?; Ensure your java command is included in PATH
  • Write your OS and it’s version: Windows 10 with WSL

What did you do when you faced the problem?

Tried to search on both Google and GitHub issues to see if anyone else is facing the same issue and found nothing.

Code:

df = tabula.read_pdf("https://my-bucket-name.s3.amazonaws.com/v1/user/62e970a844d091d90069ab7d/file/62ed00ce74b8a716407975d6?SIGNED_HASH")[0]
df.to_csv('./test.csv', encoding='utf-8')
print(df)

Expected behavior:

CSV with PDF data.

Actual behavior:

Traceback (most recent call last):
  File "main.py", line 3, in <module>
    df = tabula.read_pdf("https://my-bucket-name.s3.amazonaws.com/v1/user/62e970a844d091d90069ab7d/file/62ed00ce74b8a716407975d6?SIGNED_HASH")[0]
  File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/io.py", line 311, in read_pdf
    path, temporary = localize_file(input_path, user_agent)
  File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/file_util.py", line 48, in localize_file
    req = urlopen(path_or_buffer)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

Related Issues:

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
Amorim33commented, Aug 12, 2022

Now it is working perfectly, thanks a lot for your help!!

2reactions
chezoucommented, Aug 12, 2022

@jgcmarins Just confirming, did you reinstall tabula-py with the latest master branch? I haven’t released it to PyPI yet.

I doubt you are using the same version of tabula-py since the stack trace shows the error of line 48 for localize_file but with the latest master branch, it should be line 59 https://github.com/chezou/tabula-py/blob/5dac2087db18022f56af081840bf10b413971708/tabula/file_util.py#L59

File "/home/alunix/code/ddc-reader-python/ddc-reader/lib/python3.8/site-packages/tabula/file_util.py", line 48, in localize_file

Suffix should not be the problem since tabula-py automatically adds it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

400 Bad Request from PUT to S3 using pre-signed URL
I am trying to upload a video file from JS to an S3 bucket but am getting a 400 bad request right now....
Read more >
HTTP 400 status code (Bad Request) - Amazon CloudFront
To fix this error, update your CloudFront distribution so that it finds the S3 bucket in the bucket's current AWS Region. To update...
Read more >
Single-part upload using pre-signed url failed with response ...
Attempt to upload a small file to a pre-signed failed with http response status 400. Expected Behavior. Uploading a small file (<5 MB, ......
Read more >
400 Bad Request Uploading To Amazon S3 With Signed ...
All GET and PUT requests for an object protected by AWS KMS will fail if not made via SSL or using SigV4. For...
Read more >
getting 400 Bad Request when trying to upload to aws s3 ...
I faced the same issue and after searching for hours, I was able to solve it by adding the region of my bucket...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found