Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[featureRequest] pd.DataFrame.iterdicts()

See original GitHub issue

Code

Can we have a method that returns an iterator of dictionary instead of tuples? Something like

def iterdicts(self, **kwargs):
    it = (x.__dict__ for x in self.itertuples())
    return it

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line] INSTALLED VERSIONS

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-143-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: None.None

pandas: 0.24.2 pytest: 3.2.3 pip: 19.0.3 setuptools: 40.8.0 Cython: 0.25.2 numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 5.7.0 sphinx: 1.7.4 patsy: 0.5.1 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.3 openpyxl: None xlrd: 1.2.0 xlwt: 1.3.0 xlsxwriter: 1.1.5 lxml.etree: 3.5.0 bs4: None html5lib: 1.0.1 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

State:
Created 4 years ago
Comments:8 (7 by maintainers)

Top GitHub Comments

2reactions

shouldseecommented, Apr 14, 2019

I did a bit of benchmarking. The result is in favour for an extension of iterrows()

Update: I added my proposed implementation of iterdict() as well. I hope we would agree that adding a such method make the code much simpler.

import pandas as pd
import string
import itertools

xrg = range(0,3)
xrga = np.array(list(string.ascii_lowercase))[xrg]

index = pd.MultiIndex.from_product([xrga,xrg])
columns = pd.MultiIndex.from_product([xrg,xrga])


dfc = pd.DataFrame( index=index,columns=columns)
it  = itertools.product(index,columns)
for i,(idx,col) in enumerate(it):
    val = map(str,idx + col)
    val = ''.join(val)
    dfc.values.flat[i] = val
    
import json
def ppJson(d):
    d = {unicode(k):v for k,v in d.items()}
    print json.dumps(d,indent=4)


def test__iterows(dfc):
    it  = dfc.iterrows()
    res = next(it)[1].to_dict()
    return res

def test__itertuples(dfc):
    it = dfc.itertuples()
    return next(it)._asdict()


def test__to_dict(dfc):
    it  = dfc.to_dict(orient='records')
    it  = iter(it)
    return next(it)


#### Proposed solution
def iterdict(self, into=dict):
    it = self.iterrows()
    for index, series in it:
        d = series.to_dict(into=into)
        d['index'] = index
        yield d
        
def test__iterdict(dfc):
    it = iterdict(dfc)
    return next(it)

print dfc
for methodName in ['test__iterows',
                   'test__itertuples',
                   'test__to_dict',
                   'test__iterdict']:
    print ('\n[method]%s'%methodName)
    method = locals().get(methodName)
    %timeit method(dfc)
    ppJson(method(dfc))

results


        0                 1                 2            
        a     b     c     a     b     c     a     b     c
a 0  a00a  a00b  a00c  a01a  a01b  a01c  a02a  a02b  a02c
  1  a10a  a10b  a10c  a11a  a11b  a11c  a12a  a12b  a12c
  2  a20a  a20b  a20c  a21a  a21b  a21c  a22a  a22b  a22c
b 0  b00a  b00b  b00c  b01a  b01b  b01c  b02a  b02b  b02c
  1  b10a  b10b  b10c  b11a  b11b  b11c  b12a  b12b  b12c
  2  b20a  b20b  b20c  b21a  b21b  b21c  b22a  b22b  b22c
c 0  c00a  c00b  c00c  c01a  c01b  c01c  c02a  c02b  c02c
  1  c10a  c10b  c10c  c11a  c11b  c11c  c12a  c12b  c12c
  2  c20a  c20b  c20c  c21a  c21b  c21c  c22a  c22b  c22c

[method]test__iterows
1000 loops, best of 3: 300 µs per loop
{
    "(2, 'b')": "a02b", 
    "(1, 'c')": "a01c", 
    "(1, 'a')": "a01a", 
    "(2, 'a')": "a02a", 
    "(2, 'c')": "a02c", 
    "(0, 'a')": "a00a", 
    "(0, 'b')": "a00b", 
    "(0, 'c')": "a00c", 
    "(1, 'b')": "a01b"
}

[method]test__itertuples
100 loops, best of 3: 2.21 ms per loop
{
    "Index": [
        "a", 
        0
    ], 
    "_9": "a02c", 
    "_8": "a02b", 
    "_7": "a02a", 
    "_6": "a01c", 
    "_5": "a01b", 
    "_4": "a01a", 
    "_3": "a00c", 
    "_2": "a00b", 
    "_1": "a00a"
}

[method]test__to_dict
1000 loops, best of 3: 1.8 ms per loop
{
    "(2, 'b')": "a02b", 
    "(1, 'c')": "a01c", 
    "(1, 'a')": "a01a", 
    "(2, 'a')": "a02a", 
    "(2, 'c')": "a02c", 
    "(0, 'a')": "a00a", 
    "(0, 'b')": "a00b", 
    "(1, 'b')": "a01b", 
    "(0, 'c')": "a00c"
}

[method]test__iterdict
1000 loops, best of 3: 268 µs per loop
{
    "index": [
        "a", 
        0
    ], 
    "(1, 'c')": "a01c", 
    "(1, 'a')": "a01a", 
    "(2, 'a')": "a02a", 
    "(2, 'b')": "a02b", 
    "(2, 'c')": "a02c", 
    "(0, 'a')": "a00a", 
    "(0, 'b')": "a00b", 
    "(0, 'c')": "a00c", 
    "(1, 'b')": "a01b"
}

0reactions

jrebackcommented, Oct 29, 2021

yep the corner cases make this a -1

Top Results From Across the Web

pandas.DataFrame.to_dict — pandas 1.5.2 documentation

Determines the type of the values of the dictionary. 'dict' (default) : dict like {column -> {index -> value}}. 'list' : dict ...

pandas.DataFrame.itertuples — pandas 1.5.2 documentation

An object to iterate over namedtuples for each row in the DataFrame with the first field possibly being the index and following fields...

pandas.DataFrame.iterrows — pandas 1.5.2 documentation

Iterate over DataFrame rows as (index, Series) pairs. ... To preserve dtypes while iterating over the rows, it is better to use itertuples()...

pandas.DataFrame.from_dict — pandas 1.5.2 documentation

Construct DataFrame from dict of array-like or dicts. Creates DataFrame object from dictionary by columns or by index allowing dtype specification. Parameters.

pandas.DataFrame.items — pandas 1.5.2 documentation

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series. Yields. labelobject. The column names...