question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

.to_xarray(): a 9Mb dataframe requires 30Gb ram

See original GitHub issue
ds1 = df.set_index(['lat','lon']).stack()
ds1.index.names = ['lat', 'lon', 'time']
ds1 = ds1.sort_index()
ds1.columns = ['T']

xr.Dataset(ds1)

I tried to transform a dataset with 2D latitude and longitude into Xarray dataset, however I failed to do so, because ram error occurred during process.

I also tried to set lat and lon as coordination directly, however it is complex to plotting and conducting geographic manipulation in the following work. This dataset is a non-rectangular area, lat and lon can not be replaced by the corner values.

In all, I hope this data can be transformed into xarray and resampled into traditional rectangle data, which can be easily dealt with.

Any codes and suggestions are sincerely welcomed.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6

github_iconTop GitHub Comments

3reactions
keewiscommented, Jul 6, 2020

thanks, that helps. First of all (unless I did something wrong with the read_csv call), there’s a Unnamed: 0 column that has to be removed.

Other than that, your data seems to be quite sparse so that’s an ideal fit for sparse:

In [38]: %%time 
    ...: df = pd.read_csv("/tmp/data.csv") 
    ...: a = df.drop("Unnamed: 0", axis=1).set_index(["lat", "lon"]) 
    ...: a = a.stack() 
    ...: a.index.names = ["lat", "lon", "time"] 
    ...: a = a.sort_index() 
    ...: a.name = "T" 
    ...: xr.DataArray.from_series(a, sparse=True) 
    ...:  
    ...:
CPU times: user 606 ms, sys: 63.9 ms, total: 670 ms
Wall time: 670 ms
Out[38]: 
<xarray.DataArray 'T' (lat: 16100, lon: 29959, time: 31)>
<COO: shape=(16100, 29959, 31), dtype=float64, nnz=1003191, fill_value=nan>
Coordinates:
  * lat      (lat) float64 37.5 37.5 37.5 37.5 37.5 ... 43.1 43.1 43.1 43.1 43.1
  * lon      (lon) float64 96.46 96.46 96.46 96.47 ... 102.6 102.6 102.6 102.6
  * time     (time) object '2011-01-01 00:00:00' ... '2011-01-31 00:00:00'
0reactions
Drfengzecommented, Jul 8, 2020

that’s only the short repr, the values are not modified:

In [5]: da.lat
Out[5]: 
<xarray.DataArray 'lat' (lat: 16100)>
array([37.49944, 37.5004 , 37.50135, ..., 43.1014 , 43.10143, 43.10144])
Coordinates:
  * lat      (lat) float64 37.5 37.5 37.5 37.5 37.5 ... 43.1 43.1 43.1 43.1 43.1

Thanks for help!I found sparse grids are not easy to plot, so I changed my code like Colab code, which is similar with the ‘rasm’ example in xr. Maybe you can show how to create this example datasets (more than the toy weather) in tutorial, which would be helpful.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Panda 800MB csv causing memory errors ( 32GB RAM)
I have a 32GB machine, the csv file is 1 million rows by 4 columns (800MB). When I run the code Python only...
Read more >
pandas.DataFrame.to_xarray — pandas 1.5.2 documentation
Return an xarray object from the pandas object. Returns. xarray.DataArray or xarray.Dataset. Data in the pandas structure converted to Dataset if the object...
Read more >
Optimizing the size of a pandas dataframe for low memory ...
In a previous post I was going through loading csv in dataframes using chunks to filter the only rows we need. However in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found