Trigger callback on dask.distributed.Variable.set
See original GitHub issueHello, I’m cross posting this from https://stackoverflow.com/questions/58648497/trigger-callback-on-dask-distributed-variable-set for better visibility, but please close this issue here if it is more appropriate for discussion on stackover flow.
I am working on a python app that requires data synchronization between a remote location (like on ec2) and a local location (like my laptop). Because some of the data involves very large arrays and I’m already using dask successfully in other places in the app, I’d like to use dask for this process.
I’m currently trying to use a dask.distributed.Variable to sync the data - either scattering the data first if it is an array - just setting it directly. I’m then able to run a get command to get the data, but what I’d really like to be possible is to connect a callback that will run on my local location whenever data is set on the remote location or visa-versa.
For example something like:
# On remote
from dask.distributed import LocalCluster, Client, Variable
cluster = LocalCluster()
client = Client(cluster)
# Define variable
aa = Variable('a')
# On local
from dask.distributed import Client, Variable
client = Client(cluster.scheduler.address)
# Define variable
aa = Variable('a')
# Define my custom callback
def my_callback():
print(aa.get())
# Do some connection magic ?????
and then if
# On remote
aa.set(1)
Trigger my callback on my local.
Is such a paradigm possible here?
I’ve also been exploring the Pub / Sub model as described here https://docs.dask.org/en/latest/futures.html#publish-subscribe but I’m still struggling to get it all working, and I don’t understand how to use await
and async
but they might be important here
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
As you suspect, I think that you should probably use Pub/Sub here and learn a little bit about async/await.
Here is a small example:
You can run many of these coroutines at once. Your asyncio event loop could run in a separate thread, or it may be that you already have an event loop running somewhere for your other application, and you can add these directly in there (this is greatly preferred).
Right. Jupyter already has an event loop running, so you should use that. It may be that your application also has such an event loop, if so you should be opportunistic about using it.
On Sat, Nov 2, 2019, 12:29 PM Nicholas Sofroniew notifications@github.com wrote: