question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

categorical does not work with Pandas DataFrames

See original GitHub issue
In [9]: sm.categorical(pd.DataFrame({'a':[1,2,12], 'b':['a','b','a']}), col='a')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-5966a3ee6951> in <module>()
----> 1 sm.categorical(pd.DataFrame({'a':[1,2,12], 'b':['a','b','a']}), col='a')

/usr/local/lib/python2.7/dist-packages/statsmodels-0.6.0-py2.7-linux-x86_64.egg/statsmodels/tools/tools.pyc in categorical(data, col, dictnames, drop)
    143     #TODO: add a NameValidator function
    144     # catch recarrays and structured arrays
--> 145     if data.dtype.names or data.__class__ is np.recarray:
    146         if not col and np.squeeze(data).ndim > 1:
    147             raise IndexError("col is None and the input array is not 1d")

/usr/local/lib/python2.7/dist-packages/pandas-0.13.0_285_gfcfaa7d-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in __getattr__(self, name)
   1802                 return self[name]
   1803             raise AttributeError("'%s' object has no attribute '%s'" %
-> 1804                                  (type(self).__name__, name))
   1805 
   1806     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'dtype'

an alternative is to use pd.get_dummies:

pd.get_dummies(pd.DataFrame({'a':[1,2,12], 'b':['a','b','a']}).b )

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
josef-pktcommented, Apr 6, 2018

categorical still doesn’t work with a DataFrame, which is a bit annoying for updating older code. e.g. test_glm #4432 categorical is still nice when we quickly want to get an exog without going through the formulas or the full pandas categorical type.

1reaction
bfcondoncommented, Oct 16, 2016

I have submitted a PR #3203 on this issue. It should handle DataFrames as well as other data formats.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Categorical data — pandas 1.5.2 documentation
Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, number ......
Read more >
Pandas Categorical data type not behaving as expected
The name "Categorical" implies there is no ordering. If there were an ordering, the data would be ordinal, not categorical. – BrenBarn. Apr...
Read more >
Using pandas categories properly is tricky... here's why
By default when grouping by on categorical columns, pandas returns a result for each value in the category, even when not present in...
Read more >
Using The Pandas Category Data Type
While categorical data is very handy in pandas. It is not necessary for every type of analysis. In fact, there can be some...
Read more >
Methods For Categorical Data in Pandas - Medium
Methods you should know to work with categorical data in Pandas ... You may have categorical data in your dataset. A categorical data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found