Unable to import large files from S3 buckets
See original GitHub issueI’m trying to use 1000 Genomes data which is already hosted in an AWS bucket. This is a raw reads file of about ~13 GiB. When calling toil.importFile
on s3://1000genomes/phase3/data/HG01977/sequence_read/SRR360135_1.filt.fastq.gz
I get the following error:
2016-04-26 14:13:37,623 ERROR:root: Got exception 'S3ResponseError: 400 Bad Request
<Error><Code>InvalidRequest</Code><Message>The specified copy source is larger than the maximum allowable size for a copy source: 5368709120</Message><RequestId>CCB50A66B881B563</RequestId><HostId>1dAVKee/hXEaLCvisKyxUHlWtLbZwqcedEXOsKVSzAoE5ISrsuykzg9teW9vzidQaIqImJxbNgI=</HostId></Error>' while writing 's3://1000genomes/phase3/data/HG01977/sequence_read/SRR360135_1.filt.fastq.gz'
Traceback (most recent call last):
File "/usr/local/bin/cwltoil", line 9, in <module>
load_entry_point('toil==3.2.0a2', 'console_scripts', 'cwltoil')()
File "/usr/local/lib/python2.7/dist-packages/toil/cwl/cwltoil.py", line 580, in main
adjustFiles(builder.job, functools.partial(writeFile, toil.importFile, {}))
File "/usr/local/lib/python2.7/dist-packages/cwltool/process.py", line 128, in adjustFiles
adjustFiles(rec[d], op)
File "/usr/local/lib/python2.7/dist-packages/cwltool/process.py", line 131, in adjustFiles
adjustFiles(d, op)
File "/usr/local/lib/python2.7/dist-packages/cwltool/process.py", line 131, in adjustFiles
adjustFiles(d, op)
File "/usr/local/lib/python2.7/dist-packages/cwltool/process.py", line 126, in adjustFiles
rec["path"] = op(rec["path"])
File "/usr/local/lib/python2.7/dist-packages/toil/cwl/cwltoil.py", line 162, in writeFile
index[x] = (writeFunc(rp), os.path.basename(x))
File "/usr/local/lib/python2.7/dist-packages/toil/common.py", line 570, in importFile
return self.jobStore.importFile(srcUrl)
File "/usr/local/lib/python2.7/dist-packages/toil/jobStores/abstractJobStore.py", line 260, in importFile
return self._importFile(findJobStoreForUrl(url), url)
File "/usr/local/lib/python2.7/dist-packages/toil/jobStores/aws/jobStore.py", line 279, in _importFile
info.copyFrom(srcKey)
File "/usr/local/lib/python2.7/dist-packages/toil/jobStores/aws/jobStore.py", line 826, in copyFrom
headers=self._s3EncryptionHeaders()).version_id
File "/usr/local/lib/python2.7/dist-packages/toil/jobStores/aws/jobStore.py", line 865, in _copyKey
headers=headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/bucket.py", line 888, in copy_key
response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
<Error><Code>InvalidRequest</Code><Message>The specified copy source is larger than the maximum allowable size for a copy source: 5368709120</Message><RequestId>CCB50A66B881B563</RequestId><HostId>1dAVKee/hXEaLCvisKyxUHlWtLbZwqcedEXOsKVSzAoE5ISrsuykzg9teW9vzidQaIqImJxbNgI=</HostId></Error>
Is it really necessary to copy the file into the jobStore bucket?
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Resolve issues with uploading large files in Amazon S3
I'm trying to upload a large file (1 GB or larger) to Amazon Simple Storage Service (Amazon S3) using the console. However, the...
Read more >unable to read large csv file from s3 bucket to python
Here are the few things that you can do: Make sure the region of the S3 bucket is the same as your AWS...
Read more >Resolve errors uploading data to or downloading data ... - AWS
To run the SELECT INTO OUTFILE S3 or LOAD DATA FROM S3 commands using Amazon Aurora, follow the steps below: 1. Create an...
Read more >Question: S3 Transfer Fails with large Files. - Boomi Community
Hi All,. While trying to transfer large files (400 MB +) using the S3 connector , we are getting the following error :....
Read more >Uploading large files to aws s3 almost always fails. #1947
I am using the AWS S3 SDK to upload files from android. The files range from 128kb to 99 MB. Files that are...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It would be useful additional feature to be able to use public buckets directly without having to make a copy.
I agree, #819