Download and save a large file as an artifact
See original GitHub issueOne of the steps of my workflow is simply downloading a large data file:
@step
def download_file(self):
req = requests.get(self.input['url'], allow_redirects=True)
self.large_file = req.content
Now, this fails with a MemoryError
because req.content
tries to read the whole file into memory. However, even though requests
has a streaming API, via iter_content()
, I don’t think it’s possible to use this because metaflow
doesn’t expose a file object to write into. If I try to store a generator object as an artifact it doesn’t work either:
def download_file(self):
req = requests.get(self.input['url'], allow_redirects=True)
self.large_file = req.iter_content(chunk_size=1024)
TypeError: can't pickle generator objects
Finally, I can’t use req.raw
:
@step
def download_file(self):
req = requests.get(self.input['url'], allow_redirects=True)
self.large_file = req.raw
TypeError: cannot serialize '_io.BufferedReader' object
If you somehow exposed the file object we were writing to, I could stream each chunk of the file separately and pickle them:
req = requests.get(self.input['url'], allow_redirects=True)
for chunk in req.iter_content(chunk_size=1024):
pickle.dump(chunk, fp)
Or ideally not use pickle at all:
req = requests.get(self.input['url'], allow_redirects=True)
for chunk in req.iter_content(chunk_size=1024):
fp.write(chunk)
Is exposing the file object, or allowing non-pickle files currently possible? If not, is it on the radar?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:4
- Comments:5
Top Results From Across the Web
Exporting and Downloading Artifacts Using the Library
Download the export file to your local file system by clicking Download next to the export file name and save the export zip...
Read more >How do I deploy large files to Artifactory? - JFrog
By default, Artifactory limits UI-generated file deployments to 100MB. You are free to adjust this limit at Administration > Artifactory > ...
Read more >Storing Build Artifacts - CircleCI
Artifacts that are text can be compressed at very little cost. If you must upload a large artifact you can upload them to...
Read more >Publish and download build artifacts - Azure Pipelines
Tips · Use forward slashes in file path arguments. · Build artifacts are stored on a Windows filesystem, which causes all UNIX permissions...
Read more >Job artifacts - GitLab Docs
You can download a specific file from the artifacts archive for a specific job with the job artifacts API. For example, to download...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Discussion on gitter from @tuulos:
Great! I guess that isn’t yet stable though? Are there usage examples that involve file storage anywhere?