Dask Client Computing Prior to Calling .compute()
See original GitHub issueHi,
I’m having a problem that is almost surely a mistake on my part, but I haven’t been able to sort it out so I’m posting it as a bug here. I’m running the following code:
from dask import dataframe as dd
from dask.distributed import Client
client = Client()
import webbrowser
webbrowser.open("http://localhost:8787/status")
Next, I run data = dd.read_csv('data.csv') # 12GB file
. When doing so, the code executes immediately, but a computation on the client that takes a few minutes begins. Then, I run data = data[data['X'] <= 180]
. When I run this second command, two computations occur on the client and the first one looks identical to the computation that occurred when I ran the read_csv
line of code. So that computation appears to be happening twice. Am I doing something obviously wrong here – I have many more commands that will follow, but I don’t want them to be executed until the very end at which I use .compute
on a command that returns a Pandas DF that will fit in my RAM.
Thanks and sorry if this is a dumb question!
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (4 by maintainers)
Top GitHub Comments
Will do, thanks @jrbourbeau!
Thanks for following up @timhdesilva. Could you open up a GitHub issue over with the Spyder folks since this appears to be Spyder-related and not an issue with Dask itself?