Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Setting values in empty multi-dimensional dataframe with multi-dimensional data?

See original GitHub issue

I am trying to find a clean, concise way of setting the values of a multi-dimensional, multi-indexed dataframe, using data of lower dimensionality. In this case, I am trying to use two-dimensional data to set values in a four-dimensional dataframe.

Unfortunately, the syntax I am using only works with 2D data if the keys I’m using are already in the dataframe’s index/column. But empty dataframes do not have any keys (yet) in their indices/columns.

For single points (0D), this is not a problem. Pandas just adds the missing key(s) appropriately and sets the value. For anything else, the key must be already there, it seems, as the below code shows.

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

# Create an empty 2-level mux (multi-index) for the index.
# The first level is run number ('r'). The second is x-axis values ('x').
mux = pd.MultiIndex(levels=[[]]*2,labels=[[]]*2,names=['r','x'])

# Create an empty 2-level mux for the column
# The first level is parameter value ('p'). The second is y-axis values ('y').
mux2 = pd.MultiIndex(levels=[[]]*2,labels=[[]]*2,names=['p','y'])

# Create the empty multi-indexed and multi-columned dataframe
df = pd.DataFrame(index=mux,columns=mux2)

# run number 0 (r=0), using parameter value 1.024 (p=1.024)...
# ... produces 2D data on an x-y grid.
data = np.array([[1,2,3],[4,5,6]])
ys = np.array([0,1,2])
xs = np.array([0,1])

# Now we want to set values in the 4D dataframe with our 2D data. Throws error.
df.loc[(0,list(xs)),(1.024,list(ys))] = data

Traceback (most recent call last):
    KeyError: 0

But single points work fine.

# Single points automatically result in new keys
df.loc[(0,xs[0]),(1.024,ys[0])] = 1
df.loc[(0,xs[0]),(1.024,ys[1])] = 2
df.loc[(0,xs[1]),(1.024,ys[0])] = 3
df.loc[(0,xs[1]),(1.024,ys[1])] = 4

# Keys are now found, and this now works.
df.loc[(0,list(xs[0:2])),(1.024,list(ys[0:2]))] = ((5,6),(7,8))

# But this does not work. '1' is not currently a key.
df.loc[(1,list(xs[0:2])),(1.024,list(ys[0:2]))] = ((1,2),(3,4))

Traceback (most recent call last):
    KeyError: 1L

Problem description

It seems the default behavior for setting single points (that is, auto-creation of keys) is different than the behavior of setting multiple points (no auto-creation of keys). This seems pretty arbitrary from my outsider perspective; not sure why the behavior shouldn’t be identical.

If there is another way of accomplishing this, I would love to hear about it. But perhaps the point behavior should be extended to multiple dimensions.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 33.1.0.post20170122 Cython: 0.25.2 numpy: 1.10.4 scipy: 0.17.1 statsmodels: 0.8.0 xarray: 0.9.1 IPython: 5.2.2 sphinx: 1.5.2 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.45.0 pandas_datareader: 0.2.1

Issue Analytics

State:
Created 6 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

2reactions

jorisvandenbosschecommented, Apr 10, 2017

@joseortiz3 Using a simpler example (without multi-indexes). What you are trying to do is this:

In [14]: df = pd.DataFrame()

In [15]: df.loc[[1,2], [1,2,3]] = data
...
KeyError: '[1 2] not in index'

And indeed, this is not supported by pandas at the moment.

BTW, if you don’t know the keys in advance to create the dataframe, my suggestion would be to gather the data in something else (eg append a list, or a few lists for the data, index, columns), and only create the dataframe at the end.

0reactions

joseortiz3commented, Apr 9, 2017

Thanks for your help. But I don’t think I’m successfully conveying what the problem is. I provided a suggestion, as per the rules.

In the end, I just had to iterate point-by-point to get what I want. Inefficient, but my time is worth more.

Top Results From Across the Web

How to create pandas dataframes with more than 2 dimensions?

Does anybody know how to create a N-dimensional pandas dataframe w/ labels? The first way I tried: #Reproducibility np.random.seed(1618033) #Set ...

Make a Pandas DataFrame with two-dimensional list | Python

Pandas Dataframe can be achieved in multiple ways. In this article, we will learn how to create a dataframe using two-dimensional List.

MultiIndex / advanced indexing — pandas 1.5.2 documentation

This section covers indexing with a MultiIndex and other advanced indexing features. See the Indexing and Selecting Data for general indexing documentation.

Accessing Data Along Multiple Dimensions in an Array

Introduce the indexing and slicing scheme for accessing a multi-dimensional array's contents. We will encounter arrays of varying dimensionalities:.

Multidimensional Array in Python | Creating a ... - eduCBA

Here, a list can have a number of values of any data type that are segregated by a delimiter like a comma. Nesting...