Can not handle Categorical in FAMD
See original GitHub issueI am using the latest version of Pandas and Prince. When I run the following example it does not work
df = pd.DataFrame( {‘variable_1’: [4, 5, 6, 7, 11, 2, 52], ‘variable_2’: [10, 20, 30, 40, 10, 74, 10], ‘variable_3’: [100, 50, 30, 50, 19, 29, 20], ‘color’: [‘red’, ‘blue’, ‘green’, ‘blue’, ‘red’, ‘red’, ‘blue’] })
df[‘color’]=df[‘color’].astype(‘category’) model = prince.FAMD( n_components = 2, copy = True, check_input = True, engine = ‘auto’, random_state = 1 ) model.fit(df)
ValueError: Not all columns in "Categorical" group are of the same type
I have also analyzed the reason why it occurs.
When it call fit of mfa it checks whether it is categorical or not by the following code:
for name, cols in sorted(self.groups.items()): all_num = all(pd.api.types.is_numeric_dtype(X[c]) for c in cols) all_cat = all(pd.api.types.is_string_dtype(X[c]) for c in cols) if not (all_num or all_cat): raise ValueError('Not all columns in "{}" group are of the same type'.format(name))
This was ok for earlier version of pandas. But now all_cat = all(pd.api.types.is_string_dtype(X[c]) for c in cols) this part does not works for Categorical data but only for object data.
So above part probably need to be corrected by following:
for name, cols in sorted(self.groups.items()): all_num = all(pd.api.types.is_numeric_dtype(X[c]) for c in cols) all_obj= all(pd.api.types.is_string_dtype(X[c]) for c in cols) all_cat= all(pd.api.types.is_categorical_dtype(X[c]) for c in cols) if not (all_num or all_obj or all_cat): raise ValueError('Not all columns in "{}" group are of the same type'.format(name))
Am I right ?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:5

Top Related StackOverflow Question
Same issue here with
pandas 1.2.4andprince 0.7.1If someone is wondering, a temporary fix is to switch from
dtype categorytodtype objectdf['color'] = df['color'].astype('object')Nice solution. After implementing it, I got an exception due to the for-loop directly after this in mfa.py.
You referenced this in your issue: https://github.com/MaxHalford/prince/blob/988f7fe01b6e4c9476517d1939f5fe0e13deb158/prince/mfa.py#L45-L52
I implemented your fix (important to keep
self.all_nums_[name] = all_num), and got a new exception referencing the for-loop after this, here: https://github.com/MaxHalford/prince/blob/988f7fe01b6e4c9476517d1939f5fe0e13deb158/prince/mfa.py#L54-L75The problem is that when running FAMD
self.all_nums_is a dictionary with only ‘Numerical’ as key, if I understand the code correct. Howeverself.groupswas created in famd.py which is a dictionary with both ‘Numerical’ and ‘Categorical’ as keys, so when it tries to run the checkif self.all_nums_[name]withnameasCategoricalit throws a KeyError exception since it only has ‘Numerical’ in keys.I changed
if self.all_nums_[name]:toif name =='Numerical':This however is only a quick fix, and probably not a robust way of fixing it.