question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Appending Pandas dataframes in for loop results in ValueError

See original GitHub issue

I recently posted this on StackOverflow. It seems to be a bug so I am posting here as well.

I want to generate a dataframe that is created by appended several separate dataframes generated in a for loop. Each individual dataframe consists of a name column, a range of integers and a column identifying a category to which the integer belongs (e.g. quintile 1 to 5). If I generate each dataframe individually and then append one to the other to create a ‘master’ dataframe then there are no problems. However, when I use a loop to create each individual dataframe then trying to append a dataframe to the master dataframe results in:

ValueError: incompatible categories in categorical concat

A work-around (suggested by jezrael) involved appending each dataframe to a list of dataframes and concatenating them using pd.concat.

I’ve written a simplified loop to illustrate:

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

# Define column names
colNames = ('a','b','c')

# Define a dataframe with the required column names
masterDF = pd.DataFrame(columns = colNames)

# A list of the group names
names = ['Group1','Group2','Group3']

# Create a dataframe for each group
for i in names:
    tempDF = pd.DataFrame(columns = colNames)
    tempDF['a'] = np.arange(1,11,1)
    tempDF['b'] = i
    tempDF['c'] = pd.cut(np.arange(1,11,1),
                        bins = np.linspace(0,10,6),
                        labels = [1,2,3,4,5])
    print(tempDF)
    print('\n')

    # Try to append temporary DF to master DF
    masterDF = masterDF.append(tempDF,ignore_index=True)

print(masterDF)

Expected Output

     a       b  c
 0   1  Group1  1
 1   2  Group1  1
 2   3  Group1  2
 3   4  Group1  2
 4   5  Group1  3
 5   6  Group1  3
 6   7  Group1  4
 7   8  Group1  4
 8   9  Group1  5
 9  10  Group1  5
10  11  Group2  1
11  12  Group2  1
12  13  Group2  2
13  14  Group2  2
...
28  29  Group3  5
29  30  Group3  5

output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.4.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8

pandas: 0.18.1 nose: None pip: 1.5.6 setuptools: 20.1.1 Cython: None numpy: 1.11.0 scipy: 0.16.1 statsmodels: None xarray: None IPython: 4.1.1 sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.0 openpyxl: 2.3.2 xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: 0.7.4.None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, Jun 29, 2016

Well, if we say that an empty series is ordered=False, then it should actually raise an error instead of changing the order of the result 😃 But actually, in this case, you don’t have an empty categorical, but just an empty frame without dtype info, so in this case it should ignore the fact that that part is ordered or not.

0reactions
sinhrkscommented, Jul 19, 2016

I met the same problem in #13626 and wrote short summary of Series Indexdifferences.

How about following spec:

  • concat 2 categories -> use the rule of union_categorical
  • concat category and other dtype (which values are all in the category, including empty) -> category
    • this rule is applied regardless of order (if there is at least one category in concatenating values)
    • the property like ordered should be preserved.
  • concat category and other dtype (which values are not in the category) -> not category (dtype is infered)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Appending Pandas dataframes in for loop results in ValueError
You can first append all DataFrames to list dfs and then concat : dfs = [] # Create a dataframe for each group...
Read more >
Appending pandas DataFrames generated in a for loop
If we append each value directly inside the loop, it will overwrite the previous value and only the last values will be added...
Read more >
pandas.DataFrame.append — pandas 1.5.2 documentation
A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once....
Read more >
How to Use Pandas Append to Combine Rows of Data in Python
This tutorial explains how to combine rows of data with the Pandas append method. It explains the syntax and shows clear examples.
Read more >
Append Pandas DataFrames Using for Loop
By using Python for loop you can append rows or columns to Pandas DataFrames. You can append a rows to DataFrame by.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found