question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dask cluster hangs while processing file using s3fs

See original GitHub issue

So I have a function which loads nc files from S3 using s3fs and then select some specific region from files and then push back to different S3 bucket. Here’s my function. I’m doing every thing in JupyterLab

def get_records(rec):
    
    d=[rec[-1][0:4], rec[-1][4:6], rec[-1][6:8], rec[-1][9:11], rec[-1][11:13]] 
    
    yr=d[0]
    mo=d[1]
    da=d[2]
    hr=d[3]
    mn=d[4]
    
    ps = s3fs.S3FileSystem(anon=True)

    period = pd.Period(str(yr)+str('-')+str(mo)+str('-')+str(da), freq='D')
    dy=period.dayofyear
    print(dy)

    cc=[7,8,9,10,11,12,13,14,15,16]  #look at the IR channels only for now
    dy="{0:0=3d}".format(dy)

    # this loop is for 10 different channels
    for c in range(10):
        ch="{0:0=2d}".format(cc[c])

    # opening 2 different time slices of given particular record
        F1=xr.open_dataset(ps.open(ps.glob('s3://noaa-goes16/ABI-L1b-RadF/'+str(yr)+'/'+str(dy)+'/'+str("{0:0=2d}".format(hr))+'/'+'OR_ABI-L1b-RadF-M3C'+ch+'*')[-2]))[['Rad']]

        F2=xr.open_dataset(ps.open(ps.glob('s3://noaa-goes16/ABI-L1b-RadF/'+str(yr)+'/'+str(dy)+'/'+str("{0:0=2d}".format(hr))+'/'+'OR_ABI-L1b-RadF-M3C'+ch+'*')[-1]))[['Rad']]
      
    # Selecting data as per given record radiance
        G1 = F1.where((F1.x >= (rec[0]-0.005)) & (F1.x <= (rec[0]+0.005)) & (F1.y >= (rec[1]-0.005)) & (F1.y <= (rec[1]+0.005)), drop=True)
        G2 = F2.where((F2.x >= (rec[0]-0.005)) & (F2.x <= (rec[0]+0.005)) & (F2.y >= (rec[1]-0.005)) & (F2.y <= (rec[1]+0.005)), drop=True)
       
    # Concating 2 time slices togethere
        G = xr.concat([G1, G2], dim  = 'time')

    # Concatiating different channels
        if c == 0:
            T = G    
        else:
            T = xr.concat([T, G], dim = 'channel')

    # Saving into nc file and storing them to S3
    path = rec[-1]+'.nc'
    T.to_netcdf(path)
    fs.put(path, bucket+path)

Now I want to use dask cluster to run this function parallelly like this this:

files = [ ]
for i in range(0, 100):
        s3_ds = dask.delayed(get_records)(records[i])
        files.append(s3_ds)

    files = dask.compute(*files)

So this thing was running perfectly fine last month but now when I try to do it again my clusters are not processing any files after a while. For example if I give 100 files to 10 cluster they won’t process any files after 60-70 even though they have memory left and will just sit idle and do nothing. So next time I tried to give just 50 files, then they’ll process around 30 files and then will sit idle. Even though they have memory left in them. I don’t know what I’m doing wrong or is it some bug in library. I upgraded all the library I was using, dask, s3fs and fssps. But still nothing is working.

Main thing is all this was working perfectly fine few weeks back, but now same code is not working anymore.

Environment:

  • Dask version: 2021.2.0
  • Python version: 3.8.8
  • S3fs version: 0.6.0
  • Fsspec: 0.9.0

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:19 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, May 2, 2021

But really the short story is: don’t use fork!

0reactions
normanrzcommented, Mar 19, 2022

But really the short story is: don’t use fork!

I think this would be a very important insight to be put in the documentation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dask cluster hangs while processing file using s3fs #7547
So I have a function which loads nc files from S3 using s3fs and then select some specific region from files and then...
Read more >
Dask Cluster not processing any data and just sitting idle after ...
Dask Cluster not processing any data and just sitting idle after a while, which was working perfectly fine couple of weeks before.
Read more >
Connect to remote data - Dask documentation
Dask uses fsspec for local, cluster and remote data IO. Other file interaction, such as loading of configuration, is done using ordinary python...
Read more >
HRRR Zarr Cloud Example (dask) - MesoWest
close() often produces errors and sometimes leaves the scheduler hanging, at least if I'm being impatient and starting/killing cluster after cluster in the...
Read more >
Faster Data Loading for Pandas on S3 | by Joshua Robinson
Reading from s3fs directly, which uses boto3 underneath, is slowest because it does not effectively use concurrency. In frameworks like Dask, this is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found