Limiting threads/cores used by xarray(/dask?)
See original GitHub issueI’m fairly new to xarray and I’m currently trying to leverage it to subset some NetCDFs. I’m running this on a shared server and would like to know how best to limit the processing power used by xarray so that it plays nicely with others. I’ve read through the dask and xarray documentation a bit but it doesn’t seem clear to me how to set a cap on cpus/threads. Here’s an example of a spatial subset:
import glob
import os
import xarray as xr
from multiprocessing.pool import ThreadPool
import dask
wd = os.getcwd()
test_data = os.path.join(wd, 'test_data')
lat_bnds = (43, 50)
lon_bnds = (-67, -80)
output = 'test_data_subset'
def subset_nc(ncfile, lat_bnds, lon_bnds, output):
if not glob.os.path.exists(output):
glob.os.makedirs(output)
outfile = os.path.join(output, os.path.basename(ncfile).replace('.nc', '_subset.nc'))
with dask.config.set(scheduler='threads', pool=ThreadPool(5)):
ds = xr.open_dataset(ncfile, decode_times=False)
ds_sub = ds.where(
(ds.lon >= min(lon_bnds)) & (ds.lon <= max(lon_bnds)) & (ds.lat >= min(lat_bnds)) & (ds.lat <= max(lat_bnds)),
drop=True)
comp = dict(zlib=True, complevel=5)
encoding = {var: comp for var in ds.data_vars}
ds_sub.to_netcdf(outfile, format='NETCDF4', encoding=encoding)
list_files = glob.glob(os.path.join(test_data, '*'))
print(list_files)
for i in list_files:
subset_nc(i, lat_bnds, lon_bnds, output)
I’ve tried a few variations on this by moving the ThreadPool
configuration around but I still see way too much activity in the server’s top
(>3000% cpu activity). I’m not sure where the issue lies.
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (7 by maintainers)
Top Results From Across the Web
xarray/dask - limiting the number of threads/cpus
I'm running this on a shared server and would like to know how best to limit the processing power used by xarray so...
Read more >Parallel computing with Dask - Xarray
The actual computation is controlled by a multi-processing or thread pool, which allows Dask to take full advantage of multiple processors available on...
Read more >Futures — Dask 2.23.0 documentation
You must start a Client to use the futures interface. This tracks state among the various worker processes or threads: from dask.distributed import...
Read more >Parallel computing with Dask — xarray 0.14.1 documentation
Note that xarray only makes use of dask.array and dask.delayed . Reading and writing data¶. The usual way to create a ...
Read more >Basics of UNIX - Berkeley Statistics
2.2.2 Fixing the number of threads (cores used). In general, if you want to limit the number of threads used, you can set...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
hi, my testcode is running properly on 5 threads thanks for the help
Hi @jhamman, please excuse the lateness of this reply. It turned out that in the end all I needed to do was set
OMP_NUM_THREADS
to the number based on my cores I want to use (2 threads/core) before launching my processes. Thanks for the help and for keeping this open. Feel free to close this thread.