upload directories
See original GitHub issueSometimes I’m working on a python package which is just a directory with contents constantly being altered. I just want to send that directory to my dask workers without making a zipfile or egg or whatever. Currently I use these functions to get the job done:
def fn_to_targz_string(fn):
with io.BytesIO() as bt:
with tarfile.open(fileobj=bt,mode='w:gz') as tf:
tf.add(fn,arcname=os.path.basename(fn))
bt.seek(0)
s=bt.read()
return s
def extract_targz_string(s,*args,**kwargs):
import io,tarfile
with io.BytesIO() as bt:
bt.write(s)
bt.seek(0)
with tarfile.open(fileobj=bt,mode='r:gz') as tf:
tf.extractall(*args,**kwargs)
If we used tarfile in this way for Client.upload_file, it wouldn’t matter whether the user wanted to upload a file or a directory. It should just work.
I believe the only necessary changes would be replacing the beginning of Client._upload_file
with something like fn_to_targz_string
(shown above) and the beginning of Worker.upload_file
with something like extract_targz_string
(shown above). If you wanted to be complete, you could also add something to the belly of Worker.upload_file
so that the module would be properly inserted into names_to_import
.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:15 (6 by maintainers)
Top GitHub Comments
any update? I found this PR https://github.com/dask/distributed/pull/939 but it was closed. I’m wondering if there’s a way to upload a directory and import only importable files in that directory.
After thinking about this more, it does seem that having a separate
upload_dir
function would be useful beyond just clarity and ergonomics. In my case, for example, the code is set up like:and Python files import each other using
import src.baz.a
. When I upload files to dask workers, I want to only upload thebaz
folder but preserve thesrc/baz
hierarchy.I was thinking an API like
def upload_dir(path_to_dir, path_prefix="")
(kwarg naming tbd) would be helpful in this case. To go through some examples:upload_dir('src/baz')
would upload just thebaz
directory and addbaz
to the Python path on the workersupload_dir('src/baz', path_prefix='src/') would upload the
bazdirectory to a
src/bazdirectory on the workers and add
src/baz` to the python pathEdit:
On second thought, it might even be more intuitive to have
upload_dir('src/baz')
upload the contents of thebaz
folder with no parent directory, andupload_dir('src/baz', path_prefix='src/baz')
would upload the contents of baz into a ‘src/baz’ folder on the remote workers.