Setting values in empty multi-dimensional dataframe with multi-dimensional data?
See original GitHub issueI am trying to find a clean, concise way of setting the values of a multi-dimensional, multi-indexed dataframe, using data of lower dimensionality. In this case, I am trying to use two-dimensional data to set values in a four-dimensional dataframe.
Unfortunately, the syntax I am using only works with 2D data if the keys I’m using are already in the dataframe’s index/column. But empty dataframes do not have any keys (yet) in their indices/columns.
For single points (0D), this is not a problem. Pandas just adds the missing key(s) appropriately and sets the value. For anything else, the key must be already there, it seems, as the below code shows.
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
# Create an empty 2-level mux (multi-index) for the index.
# The first level is run number ('r'). The second is x-axis values ('x').
mux = pd.MultiIndex(levels=[[]]*2,labels=[[]]*2,names=['r','x'])
# Create an empty 2-level mux for the column
# The first level is parameter value ('p'). The second is y-axis values ('y').
mux2 = pd.MultiIndex(levels=[[]]*2,labels=[[]]*2,names=['p','y'])
# Create the empty multi-indexed and multi-columned dataframe
df = pd.DataFrame(index=mux,columns=mux2)
# run number 0 (r=0), using parameter value 1.024 (p=1.024)...
# ... produces 2D data on an x-y grid.
data = np.array([[1,2,3],[4,5,6]])
ys = np.array([0,1,2])
xs = np.array([0,1])
# Now we want to set values in the 4D dataframe with our 2D data. Throws error.
df.loc[(0,list(xs)),(1.024,list(ys))] = data
Traceback (most recent call last):
KeyError: 0
But single points work fine.
# Single points automatically result in new keys
df.loc[(0,xs[0]),(1.024,ys[0])] = 1
df.loc[(0,xs[0]),(1.024,ys[1])] = 2
df.loc[(0,xs[1]),(1.024,ys[0])] = 3
df.loc[(0,xs[1]),(1.024,ys[1])] = 4
# Keys are now found, and this now works.
df.loc[(0,list(xs[0:2])),(1.024,list(ys[0:2]))] = ((5,6),(7,8))
# But this does not work. '1' is not currently a key.
df.loc[(1,list(xs[0:2])),(1.024,list(ys[0:2]))] = ((1,2),(3,4))
Traceback (most recent call last):
KeyError: 1L
Problem description
It seems the default behavior for setting single points (that is, auto-creation of keys) is different than the behavior of setting multiple points (no auto-creation of keys). This seems pretty arbitrary from my outsider perspective; not sure why the behavior shouldn’t be identical.
If there is another way of accomplishing this, I would love to hear about it. But perhaps the point behavior should be extended to multiple dimensions.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 33.1.0.post20170122 Cython: 0.25.2 numpy: 1.10.4 scipy: 0.17.1 statsmodels: 0.8.0 xarray: 0.9.1 IPython: 5.2.2 sphinx: 1.5.2 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.45.0 pandas_datareader: 0.2.1
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
@joseortiz3 Using a simpler example (without multi-indexes). What you are trying to do is this:
And indeed, this is not supported by pandas at the moment.
BTW, if you don’t know the keys in advance to create the dataframe, my suggestion would be to gather the data in something else (eg append a list, or a few lists for the data, index, columns), and only create the dataframe at the end.
Thanks for your help. But I don’t think I’m successfully conveying what the problem is. I provided a suggestion, as per the rules.
In the end, I just had to iterate point-by-point to get what I want. Inefficient, but my time is worth more.