question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GET Request fails when using Alluxio S3 API, same request succeeds when AWS S3 API is used directly

See original GitHub issue

Alluxio Version: What version of Alluxio are you using? enterprise-2.8.0-2.0

Describe the bug A clear and concise description of what the bug is.

S3 GET request fails with a 404 response when using endpoint. Similar request succeeds when S3 API is used directly

OK Response from S3 API

- http-outgoing-1 >>  GET /REPLACE_BUCKET_NAME/?list-type=2&delimiter=%2F&max-keys=2&prefix=REPLACE_WITH_PREFIX%2F&fetch-owner=false HTTP/1.1
- http-outgoing-1 >>  Host: s3.us-west-2.amazonaws.com
- http-outgoing-1 >>  amz-sdk-invocation-id: ...
- http-outgoing-1 >>  amz-sdk-request: ...
- http-outgoing-1 >>  amz-sdk-retry: 0/0/500
- http-outgoing-1 >>  Authorization: ...
- http-outgoing-1 >>  Content-Type: application/octet-stream
- http-outgoing-1 >>  Content-Length: 0
- http-outgoing-1 >>  Connection: Keep-Alive
- http-outgoing-1 >>  [\r][\n] 
- http-outgoing-1 <<  HTTP/1.1 200 OK
- http-outgoing-1 <<  x-amz-id-2: ...
- http-outgoing-1 <<  x-amz-request-id: ..
- http-outgoing-1 <<  Date: Thu, 07 Jul 2022 05:46:22 GMT
- http-outgoing-1 <<  x-amz-bucket-region: ...
- http-outgoing-1 <<  Content-Type: application/xml
- http-outgoing-1 <<  Transfer-Encoding: chunked
- http-outgoing-1 <<  Server: AmazonS3
- http-outgoing-1 <<  [\r][\n] 
- http-outgoing-1 << *HTTP/1.1 200 OK*

404 Response from Alluxio S3 API For Same Request

- http-outgoing-0 >>  GET /api/v1/s3/REPLACE_BUCKET_NAME/?list-type=2&delimiter=%2F&max-keys=2&prefix=REPLACE_WITH_PREFIX%2F&fetch-owner=false HTTP/1.1[\r][\n] 
- http-outgoing-0 >>  Host: api.g.....com:39999[\r][\n] 
- http-outgoing-0 >>  amz-sdk-invocation-id: ...
- http-outgoing-0 >>  amz-sdk-request: ...
- http-outgoing-0 >>  amz-sdk-retry: 0/0/500 
- http-outgoing-0 >>  Authorization: ... 
- http-outgoing-0 >>  Content-Type: application/octet-stream
- http-outgoing-0 >>  Content-Length: 0
- http-outgoing-0 >>  Connection: Keep-Alive
- http-outgoing-0 >>  [\r][\n] 
- http-outgoing-0 <<  HTTP/1.1 404 Not Found
- http-outgoing-0 <<  Date: Thu, 07 Jul 2022 05:48:00 GMT[\r][\n] 
- http-outgoing-0 <<  Content-Type: application/xml[\r][\n] 
- http-outgoing-0 <<  Content-Length: 196[\r][\n] 
- http-outgoing-0 <<  Server: Jetty(9.4.43.v20210629)[\r][\n] 
- http-outgoing-0 <<  [\r][\n] 
- http-outgoing-0 <<  <Error><RequestId></RequestId><Code>NoSuchBucket</Code><Message>Path  /REPLACE_BUCKET_NAME/REPLACE_WITH_PREFIX  does not exist.</Message><Resource>REPLACE_BUCKET_NAME</Resource></Error> 
- http-outgoing-0 << *HTTP/1.1 404 Not Found*

To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)

  1. create a 1:1 mapped s3 bucket mount which is such that: s3://some_bucket on alluxio === s3://some_bucket on s3

Try to Expected behavior Expect a similar response, 200 OK as returned from S3 API

Urgency Critical, blocks creation of files from Spark

Are you planning to fix it TBD

Additional context Trying to use Alluxio to read/write data from Spark. Writing some data from Spark works fine when s3a endpoint is not used, fails when Alluxio S3 API endpoint is used due to above 404 response.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
HelloHorizoncommented, Jul 7, 2022

@abmo-x thanks for reporting. @ZhuTopher can you take a look?

0reactions
ZhuTophercommented, Aug 19, 2022

@abmo-x I believe this issue has been resolved by this PR #15965.

I tried to replicate your setup with EMR as the Hadoop & Spark cluster. You can see my reproduction steps here.

  • I used an S3 root UFS mount rather than a sub-directory mount.

This hasn’t been resolved by that above PR, instead here is a PR which should fix this: https://github.com/Alluxio/alluxio/pull/16074

  • For some reason, the bundled EMR Spark behaves differently than the open-source Apache Spark and so I couldn’t repro this using my EMR environment
Read more comments on GitHub >

github_iconTop Results From Across the Web

S3 API - Alluxio v2.9.0 (stable) Documentation - Introduction
Amazon S3 is a distributed system. If it receives multiple write requests for the same object simultaneously, it overwrites all but the last...
Read more >
Troubleshoot HTTP 5xx errors from Amazon S3
When I make a request to Amazon Simple Storage Service (Amazon S3), Amazon S3 returns a 5xx status error. How do I troubleshoot...
Read more >
Building an Event-Driven, Fault-Tolerant Data Pipeline with AWS ...
What we did to get around this was set up the configuration we wanted, ... One method must be used when using the...
Read more >
Common Alluxio Commands - Tencent Cloud
Output a list of node that have the specified file data. ls, ls "path", List all the files and directories directly under the...
Read more >
Evaluation of Storage Systems for Big Data Analytics - KEEP
application is benchmarked in different use cases to demonstrate the benefits of the hybrid model. ... patible with Amazon S3 or OpenStack Swift...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found