NaN values for variables when converting from a pandas dataframe to xarray.DataSet
See original GitHub issueCode Sample, a copy-pastable example if possible
wind_surface hurs bui fwi
lat lon time
34.511383 16.467664 1971-01-10 12:00:00 29.658546 70.481293 ... 8.134300 7.409146
34.515558 16.723973 1971-01-10 12:00:00 30.896049 71.356644 ... 8.874528 8.399877
34.517359 16.852138 1971-01-10 12:00:00 31.514799 71.708603 ... 8.789351 8.763743
34.518970 16.980310 1971-01-10 12:00:00 32.105423 72.023773 ... 8.962551 9.125644
34.520391 17.108487 1971-01-10 12:00:00 32.724174 72.106110 ... 8.725038 9.249104
[5 rows x 10 columns]
In [81]: df.to_xarray()
Out[81]:
<xarray.Dataset>
Dimensions: (lat: 5, lon: 5, time: 1)
Coordinates:
* lat (lat) float64 34.51 34.52 34.52 34.52 34.52
* lon (lon) float64 16.47 16.72 16.85 16.98 17.11
* time (time) object '1971-01-10 12:00:00'
Data variables:
wind_surface (lat, lon, time) float64 29.658546 nan nan ... nan 32.724174
hurs (lat, lon, time) float64 70.48129 nan nan ... nan nan 72.10611
precip (lat, lon, time) float64 0.0 nan nan nan ... nan nan nan 0.0
tmax (lat, lon, time) float64 16.060822 nan nan ... nan 16.185822
ffmc (lat, lon, time) float64 83.58528 nan nan ... nan nan 84.05673
isi (lat, lon, time) float64 7.7641253 nan nan ... nan nan 9.64494
dmc (lat, lon, time) float64 6.797345 nan nan ... nan nan 7.90833
dc (lat, lon, time) float64 25.314878 nan nan ... nan 24.324644
bui (lat, lon, time) float64 8.1343 nan nan ... nan nan 8.725038
fwi (lat, lon, time) float64 7.409146 nan nan ... nan 9.2491045
Problem description
Hi, I get those nan values for variables when I try to convert from a pandas.DataFrame with MultiIndex to a xarray.DataArray. The same happend if I try to build a xarray.Dataset and then unstack the multiindex as shown below:
ds = xr.Dataset(df)
ds.unstack('dim_0')
<xarray.Dataset>
Dimensions: (lat: 5, lon: 5, time: 1)
Coordinates:
* lat (lat) float64 34.51 34.52 34.52 34.52 34.52
* lon (lon) float64 16.47 16.72 16.85 16.98 17.11
* time (time) object '1971-01-10 12:00:00'
Data variables:
wind_surface (lat, lon, time) float32 29.658546 nan nan ... nan 32.724174
hurs (lat, lon, time) float32 70.48129 nan nan ... nan nan 72.10611
precip (lat, lon, time) float32 0.0 nan nan nan ... nan nan nan 0.0
Maybe it’s not an issue. I don’t know. I’m lost. Any help is welcome.
Regards
Output of xr.show_versions()
# Paste the output here xr.show_versions() here
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3 (default, May 9 2019, 11:55:04)
[GCC 8.3.0]
python-bits: 64
OS: Linux
OS-release: 5.0.0-16-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2
xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.3 scipy: 1.3.0 netCDF4: 1.5.2 pydap: installed h5netcdf: 0.7.3 h5py: 2.9.0 Nio: None zarr: 2.3.1 cftime: 1.0.1 nc_time_axis: 1.1.0 PseudonetCDF: None rasterio: 1.0.23 cfgrib: None iris: 2.3.0dev0 bottleneck: 1.2.1 dask: 1.2.2 distributed: None matplotlib: 3.1.0 cartopy: 0.17.1.dev168+ seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.1.1 conda: None pytest: None IPython: 7.5.0 sphinx: 2.0.1
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
converting pandas dataframe to xarray dataset - Stack Overflow
To convert your data to xarray, first set the datetime as index in pandas, with df.set_index('datetime') . ds = df.set_index('datetime').
Read more >xarray.Dataset.from_dataframe
Each column will be converted into an independent variable in the Dataset. ... product of one-dimensional indices (filling in missing values with NaN)....
Read more >pandas.DataFrame.to_xarray — pandas 1.5.2 documentation
Return an xarray object from the pandas object. Data in the pandas structure converted to Dataset if the object is a DataFrame, or...
Read more >Working with pandas - xarray - Read the Docs
We see that each variable and coordinate in the Dataset is now a column in the DataFrame, with the exception of indexes which...
Read more >Xarray Fundamentals - Research Computing in Earth Sciences
Select data by position using .isel with values or slices ... Series : pandas.Dataframe :: xarray.DataArray : xarray.Dataset.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I recently had a similar issue and found out the cause: When transforming from a dataframe to an xarray, the xarray allocates memory for all possible combinations of the coordinates. In this particular case, you have 5 unique values for latitude and longitude in your five rows, which means there are 5*5=25 possible combinations of lat/long values. All missing values are then filled in as
NaN
.Let me illustrate by recreating just your data on latitude, longitude,
wind_surface
andhurs
:But for the xarray, this means it will end up creating a 5x5 array, of which only 5 values are given along the diagonal. This is very clearly visible when showing just the
DataArray
for a single column:However, as
to_xarray()
outputs aDataSet
, eachDataArray
, i.e. column from the dataframe, is summarized as a 1D array, which makes it seem like a lot of data is just ‘missing’:So it works as intended, but can throw you for a loop if you don’t realize it’s creating an array the size of all possible index combinations.
@shoyer can you close this issue?
Thanks @sjvrijn