MultiIndex - Comparison with Mixed Frequencies (and other FUBAR)
See original GitHub issueSetup:
index = pd.Index(['PCE']*4, name='Variable')
data = [
pd.Period('2018Q2'),
pd.Period('2021', freq='5A-Dec'),
pd.Period('2026', freq='10A-Dec'),
pd.Period('2017Q2')
]
ser = pd.Series(data, index=index, name='Period')
In the real-life version of this issue, ‘Period’ is a column in a DataFrame and I need to append it as a new level to the index. The snippets here show the problem(s) in both py2 and py3, but for reasons unknown df.set_index('Period', append=True)
goes through fine in py2.
The large majority of Period values are quarterly-frequency.
py2
>>> pd.__version__
'0.20.2'
>>> ser.sort_values()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1710, in sort_values
argsorted = _try_kind_sort(arr[good])
File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort
return arr.argsort(kind=kind)
File "pandas/_libs/period.pyx", line 725, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11842)
pandas._libs.period.IncompatibleFrequency: Input has different freq=10A-DEC from Period(freq=Q-DEC)
>>> ser.to_frame()
Period
Variable
PCE 2018Q2
PCE 2021
PGDP 2026
PGDP 2017Q2
>>> ser.to_frame().set_index('Period', append=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2836, in set_index
index = MultiIndex.from_arrays(arrays, names=names)
File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays
labels, levels = _factorize_from_iterables(arrays)
File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables
return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable
cat = Categorical(values, ordered=True)
File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 310, in __init__
raise NotImplementedError("> 1 ndim Categorical are not "
NotImplementedError: > 1 ndim Categorical are not supported at this time
No idea why it thinks Categorical is relevant here. That doesn’t happen in py3.
For the purposes of sort_values
, refusing to sort might make sense. But when all I care about is set_index
, I’m pretty indifferent to the ordering.
py3
>>> pd.__version__
'0.20.2'
>>> ser.sort_values()
pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC)
During handling of the above exception, another exception occurred:
SystemError: <built-in function isinstance> returned a result with an error set
[...]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1710, in sort_values
argsorted = _try_kind_sort(arr[good])
File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort
return arr.argsort(kind=kind)
File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713)
File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__
return not self == other
File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__
if isinstance(other, compat.string_types):
SystemError: <built-in function isinstance> returned a result with an error set
>>> ser.to_frame().set_index('Period', append=True)
pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC)
During handling of the above exception, another exception occurred:
SystemError: <built-in function isinstance> returned a result with an error set
[...]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2836, in set_index
index = MultiIndex.from_arrays(arrays, names=names)
File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays
labels, levels = _factorize_from_iterables(arrays)
File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables
return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in <listcomp>
return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable
cat = Categorical(values, ordered=True)
File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 298, in __init__
codes, categories = factorize(values, sort=True)
File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 567, in factorize
assume_unique=True)
File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 486, in safe_sort
sorter = values.argsort()
File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713)
File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__
return not self == other
File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__
if isinstance(other, compat.string_types):
SystemError: <built-in function isinstance> returned a result with an error set
I have no idea what to make of this.
A problem that I have not been able to replicate with a copy/pasteable subset of the data:
>>> mi = pd.MultiIndex.from_arrays([period.index, period])
>>> mi
[... prints roughly what we'd expect...]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 800, in shape
return self._values.shape
File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 860, in _values
return self.values
File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 667, in values
self._tuples = lib.fast_zip(values)
File "pandas/_libs/lib.pyx", line 549, in pandas._libs.lib.fast_zip (pandas/_libs/lib.c:10513)
ValueError: all arrays must be same length
>>> mi.names
FrozenList(['Variable', None])
>>> mi[0]
('CPROF', 'Period')
>>> mi[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1377, in __getitem__
if lab[key] == -1:
IndexError: index 1 is out of bounds for axis 0 with size 1
AFAICT it took the name ‘Period’ and made that the only value in the new level of the MultiIndex. Really no idea what’s going on here.
Issue Analytics
- State:
- Created 6 years ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
That’s the real-life column (2770 rows). Looks a lot like
ser
from the snippet, but I haven’t figured out a snippet that demonstrates the problem. It looks like it’s caused by passing a single-column DataFrame tofrom_arrays
:On the plus side its clearly a user error (this guy). Ideally it’d be caught in
__init__
though.Unchanged.
This results in the same error: