question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Clean already exist tmp directory when use s3 rest api when initiate multipart upload use s3 rest api.

See original GitHub issue

According to the description in the aws s3 document, the s3 service will only store the latest data (that is, directly overwrite the existing data). The work of checking whether the data already exists should be done by the client.

Amazon S3 is a distributed system. If it receives multiple write requests for the same object simultaneously, it overwrites all but the last object written. Amazon S3 does not provide object locking; if you need this, make sure to build it into your application layer or use versioning instead. PutObject

Therefore, many third-party clients do not support the process of first checking and then overwriting (like spark and mingo).

#14203 In this pr, @ZacBlanco already implement overwrite in createObjectOrUploadPart func.

So, do we also need to do some cleanup operations when initializing multipart upload the request? For example, delete the tmp directory and the multipart upload file.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
jffreecommented, Nov 16, 2021

@ZhuTopher #14487 This pr (not merged) is used to clean up the tmp directory when user fails to upload the file and then retry the upload. #14328 This pr (merged) is used to clean up the tmp directory when user fails to upload the file and does not retry the operation.

0reactions
ZhuTophercommented, Nov 15, 2021

@jffree I recall you had a PR with these changes already, or did I misremember that? Either way I’m in agreement with:

There are two scenarios here: one is that the user fails to upload the file and does not retry the operation. At this time, the tmp directory should be cleaned by lazy scanning (#14328 ). The other is that the user fails to upload the file and then retry the upload. At this time, the tmp directory should be cleaned up immediately, without affecting the retry operation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Uploading and copying objects using multipart upload
When you send a request to initiate a multipart upload, Amazon S3 returns a response with an upload ID, which is a unique...
Read more >
Uploading objects - Amazon Simple Storage Service
Upload files or folders to an Amazon S3 bucket. ... Upload an object in parts using the Amazon SDKs, REST API, or Amazon...
Read more >
Committing work to S3 with the S3A Committers
The application attempt ID is used to create a unique path under this directory, resulting in a path ~/tmp/staging/${user}/${application-attempt-id}/ under ...
Read more >
S3 (Simple Storage Service) Implementation Guide
Recommendations for implementing the S3 REST API . ... Initiate Multipart Upload . ... Existing group and bucket policies that use the.
Read more >
Working with multipart uploads - Hitachi Vantara Knowledge
With the Hitachi API for Amazon S3, you can perform operations to create ... with the same name already exists, you can delete...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found