question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.loc[n] = dict(..) fails with some type combinations

See original GitHub issue

Code Sample, a copy-pastable example if possible

This one fails:

# Your code here
In [9]: d = pd.DataFrame(columns=['time', 'value'])                    
In [9]: d.loc[0] = dict(time=pd.to_timedelta(5, unit='s'), value='foo')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-9-b557eb950858> in <module>()
----> 1 d.loc[0] = dict(time=pd.to_timedelta(5, unit='s'), value='foo')

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    177             key = com._apply_if_callable(key, self.obj)
    178         indexer = self._get_setitem_indexer(key)
--> 179         self._setitem_with_indexer(indexer, value)
    180 
    181     def _has_valid_type(self, k, axis):

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    423                                        name=indexer)
    424 
--> 425                     self.obj._data = self.obj.append(value)._data
    426                     self.obj._maybe_update_cacher(clear=True)
    427                     return self.obj

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity)
   4628             other = DataFrame(other.values.reshape((1, len(other))),
   4629                               index=index,
-> 4630                               columns=combined_columns)
   4631             other = other._convert(datetime=True, timedelta=True)
   4632             if not self.columns.equals(combined_columns):

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    304             else:
    305                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 306                                          copy=copy)
    307         elif isinstance(data, (list, types.GeneratorType)):
    308             if isinstance(data, types.GeneratorType):

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
    481             values = maybe_infer_to_datetimelike(values)
    482 
--> 483         return create_block_manager_from_blocks([values], [columns, index])
    484 
    485     @property

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4294                                      placement=slice(0, len(axes[0])))]
   4295 
-> 4296         mgr = BlockManager(blocks, axes)
   4297         mgr._consolidate_inplace()
   4298         return mgr

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2790                     raise AssertionError('Number of Block dimensions (%d) '
   2791                                          'must equal number of axes (%d)' %
-> 2792                                          (block.ndim, self.ndim))
   2793 
   2794         if do_integrity_check:

AssertionError: Number of Block dimensions (1) must equal number of axes (2)

But this one succeeds:

In [11]: d.loc[0] = dict(time=pd.to_timedelta(5, unit='s'), value=5)

In [12]: d
Out[12]: 
      time value
0 00:00:05     5

This one also succeeds:

In [13]: d = pd.DataFrame(columns=['time', 'value'])

In [14]: d.loc[0] = dict(time=3, value='foo')

In [15]: d
Out[15]: 
  time value
0    3   foo

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

The current behavior is a problem because it is inconsistent, and depends on the type of data provided. Mixing timedelta with str fails, but timedelta with int works, as does int with str.

I believe this is related to aggressive type inference previously noted in #13829.

Expected Output

Not crashing.

Output of pd.show_versions()

In [16]: pd.show_versions() /home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/xarray/core/formatting.py:16: FutureWarning: The pandas.tslib module is deprecated and will be removed in a future version. from pandas.tslib import OutOfBoundsDatetime

INSTALLED VERSIONS

commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-77-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 35.0.2 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: 0.9.5 IPython: 6.0.0 sphinx: 1.5.5 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.0 tables: None numexpr: 2.6.0 feather: None matplotlib: 2.0.1 openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999 sqlalchemy: 1.0.9 pymysql: None psycopg2: None jinja2: 2.9.5 s3fs: 0.1.0 pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, Jun 19, 2017

Ah, yes, I see that as well (so when the label already exists). This already is raising in 0.19.2, so not a new bug …

0reactions
phoflcommented, Nov 15, 2020

Works now, setting once and setting twice

Read more comments on GitHub >

github_iconTop Results From Across the Web

Every product/combination of nested dictionaries saved to ...
The main issue, is the inconsistency of the data dict formats: ... a list of dicts: (1) convert each dict into DataFrame (2)...
Read more >
pandas.DataFrame.query — pandas 1.5.2 documentation
Query the columns of a DataFrame with a boolean expression. ... DataFrame.loc and if that fails because of a multidimensional key (e.g., a...
Read more >
Tips for Selecting Columns in a DataFrame
This article will discuss several ways that the pandas iloc function can be used to select columns of data.
Read more >
pyspark.sql module - Apache Spark
Creates a DataFrame from an RDD , a list or a pandas.DataFrame . When schema is a list of column names, the type...
Read more >
How to create dictionary with multiple keys from dataframe in ...
In order to be able to create a dictionary from your dataframe, such that the keys are tuples of combinations (according to your...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found