question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to import large files from S3 buckets

See original GitHub issue

I’m trying to use 1000 Genomes data which is already hosted in an AWS bucket. This is a raw reads file of about ~13 GiB. When calling toil.importFile on s3://1000genomes/phase3/data/HG01977/sequence_read/SRR360135_1.filt.fastq.gz I get the following error:

2016-04-26 14:13:37,623 ERROR:root: Got exception 'S3ResponseError: 400 Bad Request
<Error><Code>InvalidRequest</Code><Message>The specified copy source is larger than the maximum allowable size for a copy source: 5368709120</Message><RequestId>CCB50A66B881B563</RequestId><HostId>1dAVKee/hXEaLCvisKyxUHlWtLbZwqcedEXOsKVSzAoE5ISrsuykzg9teW9vzidQaIqImJxbNgI=</HostId></Error>' while writing 's3://1000genomes/phase3/data/HG01977/sequence_read/SRR360135_1.filt.fastq.gz'
Traceback (most recent call last):
  File "/usr/local/bin/cwltoil", line 9, in <module>
    load_entry_point('toil==3.2.0a2', 'console_scripts', 'cwltoil')()
  File "/usr/local/lib/python2.7/dist-packages/toil/cwl/cwltoil.py", line 580, in main
    adjustFiles(builder.job, functools.partial(writeFile, toil.importFile, {}))
  File "/usr/local/lib/python2.7/dist-packages/cwltool/process.py", line 128, in adjustFiles
    adjustFiles(rec[d], op)
  File "/usr/local/lib/python2.7/dist-packages/cwltool/process.py", line 131, in adjustFiles
    adjustFiles(d, op)
  File "/usr/local/lib/python2.7/dist-packages/cwltool/process.py", line 131, in adjustFiles
    adjustFiles(d, op)
  File "/usr/local/lib/python2.7/dist-packages/cwltool/process.py", line 126, in adjustFiles
    rec["path"] = op(rec["path"])
  File "/usr/local/lib/python2.7/dist-packages/toil/cwl/cwltoil.py", line 162, in writeFile
    index[x] = (writeFunc(rp), os.path.basename(x))
  File "/usr/local/lib/python2.7/dist-packages/toil/common.py", line 570, in importFile
    return self.jobStore.importFile(srcUrl)
  File "/usr/local/lib/python2.7/dist-packages/toil/jobStores/abstractJobStore.py", line 260, in importFile
    return self._importFile(findJobStoreForUrl(url), url)
  File "/usr/local/lib/python2.7/dist-packages/toil/jobStores/aws/jobStore.py", line 279, in _importFile
    info.copyFrom(srcKey)
  File "/usr/local/lib/python2.7/dist-packages/toil/jobStores/aws/jobStore.py", line 826, in copyFrom
    headers=self._s3EncryptionHeaders()).version_id
  File "/usr/local/lib/python2.7/dist-packages/toil/jobStores/aws/jobStore.py", line 865, in _copyKey
    headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/bucket.py", line 888, in copy_key
    response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
<Error><Code>InvalidRequest</Code><Message>The specified copy source is larger than the maximum allowable size for a copy source: 5368709120</Message><RequestId>CCB50A66B881B563</RequestId><HostId>1dAVKee/hXEaLCvisKyxUHlWtLbZwqcedEXOsKVSzAoE5ISrsuykzg9teW9vzidQaIqImJxbNgI=</HostId></Error>

Is it really necessary to copy the file into the jobStore bucket?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
tetroncommented, Apr 26, 2016

It would be useful additional feature to be able to use public buckets directly without having to make a copy.

0reactions
nsjakecommented, Apr 26, 2016

I agree, #819

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resolve issues with uploading large files in Amazon S3
I'm trying to upload a large file (1 GB or larger) to Amazon Simple Storage Service (Amazon S3) using the console. However, the...
Read more >
unable to read large csv file from s3 bucket to python
Here are the few things that you can do: Make sure the region of the S3 bucket is the same as your AWS...
Read more >
Resolve errors uploading data to or downloading data ... - AWS
To run the SELECT INTO OUTFILE S3 or LOAD DATA FROM S3 commands using Amazon Aurora, follow the steps below: 1. Create an...
Read more >
Question: S3 Transfer Fails with large Files. - Boomi Community
Hi All,. While trying to transfer large files (400 MB +) using the S3 connector , we are getting the following error :....
Read more >
Uploading large files to aws s3 almost always fails. #1947
I am using the AWS S3 SDK to upload files from android. The files range from 128kb to 99 MB. Files that are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found