Clean already exist tmp directory when use s3 rest api when initiate multipart upload use s3 rest api.
See original GitHub issueAccording to the description in the aws s3 document, the s3 service will only store the latest data (that is, directly overwrite the existing data). The work of checking whether the data already exists should be done by the client.
Amazon S3 is a distributed system. If it receives multiple write requests for the same object simultaneously, it overwrites all but the last object written. Amazon S3 does not provide object locking; if you need this, make sure to build it into your application layer or use versioning instead. PutObject
Therefore, many third-party clients do not support the process of first checking and then overwriting (like spark and mingo).
#14203 In this pr, @ZacBlanco already implement overwrite in createObjectOrUploadPart
func.
So, do we also need to do some cleanup operations when initializing multipart upload the request? For example, delete the tmp directory and the multipart upload file.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (13 by maintainers)
Top GitHub Comments
@ZhuTopher #14487 This pr (not merged) is used to clean up the tmp directory when user fails to upload the file and then retry the upload. #14328 This pr (merged) is used to clean up the tmp directory when user fails to upload the file and does not retry the operation.
@jffree I recall you had a PR with these changes already, or did I misremember that? Either way I’m in agreement with: