question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MultiIndex - Comparison with Mixed Frequencies (and other FUBAR)

See original GitHub issue

Setup:

index = pd.Index(['PCE']*4, name='Variable')
data = [
	pd.Period('2018Q2'),
	pd.Period('2021', freq='5A-Dec'),
	pd.Period('2026', freq='10A-Dec'),
	pd.Period('2017Q2')
	]
ser = pd.Series(data, index=index, name='Period')

In the real-life version of this issue, ‘Period’ is a column in a DataFrame and I need to append it as a new level to the index. The snippets here show the problem(s) in both py2 and py3, but for reasons unknown df.set_index('Period', append=True) goes through fine in py2.

The large majority of Period values are quarterly-frequency.

py2

>>> pd.__version__
'0.20.2'
>>> ser.sort_values()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1710, in sort_values
    argsorted = _try_kind_sort(arr[good])
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort
    return arr.argsort(kind=kind)
  File "pandas/_libs/period.pyx", line 725, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11842)
pandas._libs.period.IncompatibleFrequency: Input has different freq=10A-DEC from Period(freq=Q-DEC)

>>> ser.to_frame()
         Period
Variable       
PCE      2018Q2
PCE        2021
PGDP       2026
PGDP     2017Q2
>>> ser.to_frame().set_index('Period', append=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2836, in set_index
    index = MultiIndex.from_arrays(arrays, names=names)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays
    labels, levels = _factorize_from_iterables(arrays)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable
    cat = Categorical(values, ordered=True)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 310, in __init__
    raise NotImplementedError("> 1 ndim Categorical are not "
NotImplementedError: > 1 ndim Categorical are not supported at this time

No idea why it thinks Categorical is relevant here. That doesn’t happen in py3.

For the purposes of sort_values, refusing to sort might make sense. But when all I care about is set_index, I’m pretty indifferent to the ordering.

py3

>>> pd.__version__
'0.20.2'
>>> ser.sort_values()
pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC)

During handling of the above exception, another exception occurred:
SystemError: <built-in function isinstance> returned a result with an error set
[...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1710, in sort_values
    argsorted = _try_kind_sort(arr[good])
  File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort
    return arr.argsort(kind=kind)
  File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713)
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__
    return not self == other
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__
    if isinstance(other, compat.string_types):
SystemError: <built-in function isinstance> returned a result with an error set

>>> ser.to_frame().set_index('Period', append=True)
pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC)

During handling of the above exception, another exception occurred:
SystemError: <built-in function isinstance> returned a result with an error set
[...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2836, in set_index
    index = MultiIndex.from_arrays(arrays, names=names)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays
    labels, levels = _factorize_from_iterables(arrays)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in <listcomp>
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable
    cat = Categorical(values, ordered=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 298, in __init__
    codes, categories = factorize(values, sort=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 567, in factorize
    assume_unique=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 486, in safe_sort
    sorter = values.argsort()
  File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713)
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__
    return not self == other
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__
    if isinstance(other, compat.string_types):
SystemError: <built-in function isinstance> returned a result with an error set

I have no idea what to make of this.

A problem that I have not been able to replicate with a copy/pasteable subset of the data:

>>> mi = pd.MultiIndex.from_arrays([period.index, period])
>>> mi
[... prints roughly what we'd expect...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 800, in shape
    return self._values.shape
  File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 860, in _values
    return self.values
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 667, in values
    self._tuples = lib.fast_zip(values)
  File "pandas/_libs/lib.pyx", line 549, in pandas._libs.lib.fast_zip (pandas/_libs/lib.c:10513)
ValueError: all arrays must be same length

>>> mi.names
FrozenList(['Variable', None])
>>> mi[0]
('CPROF', 'Period')
>>> mi[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1377, in __getitem__
    if lab[key] == -1:
IndexError: index 1 is out of bounds for axis 0 with size 1

AFAICT it took the name ‘Period’ and made that the only value in the new level of the MultiIndex. Really no idea what’s going on here.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
jbrockmendelcommented, Jul 29, 2017

(not sure what period is)

That’s the real-life column (2770 rows). Looks a lot like ser from the snippet, but I haven’t figured out a snippet that demonstrates the problem. It looks like it’s caused by passing a single-column DataFrame to from_arrays:

index = pd.Index(['CPROF', 'HOUSING', 'INDPROD', 'NGDP', 'PGDP'])
data = [pd.Period('1968Q4')]*5
df = pd.DataFrame(data, index=index, columns=['Period'])
mi = pd.MultiIndex.from_arrays([df.index, df])

>>> mi
MultiIndex(levels=[['CPROF', 'HOUSING', 'INDPROD', 'NGDP', 'PGDP'], ['Period']],
           labels=[[0, 1, 2, 3, 4], [0]])
>>> mi.shape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 800, in shape
    return self._values.shape
  File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 860, in _values
    return self.values
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 667, in values
    self._tuples = lib.fast_zip(values)
  File "pandas/_libs/lib.pyx", line 549, in pandas._libs.lib.fast_zip (pandas/_libs/lib.c:10513)
ValueError: all arrays must be same length

On the plus side its clearly a user error (this guy). Ideally it’d be caught in __init__ though.

Can you try first upgrading to 0.20.3 and see if anything changes on that end?

Unchanged.

0reactions
toobazcommented, Aug 26, 2017

This results in the same error:

In [2]: pd.Index([pd.Timestamp('2000-01-03 00:00:00', freq='B'),
                  pd.Period('2000-01-03', 'B'),
                  pd.Period('2000-01-03', 'B')]).sort_values()
[...]
SystemError: <built-in function isinstance> returned a result with an error set

Read more comments on GitHub >

github_iconTop Results From Across the Web

MultiIndex / advanced indexing — pandas 1.5.2 documentation
This section covers indexing with a MultiIndex and other advanced indexing features. See the Indexing and Selecting Data for general indexing documentation.
Read more >
Untitled
Speech therapy exercises for als, Thyroglossal duct surgery recovery, Help us accept each other sheet music, Stowe brown sound of music, Scenic byway ......
Read more >
Untitled
Helix6 garcinia, Oroa valve, Nosho pak sarkar history, 38c vs 38d, Calciam, ... Hs2 train frequency, Drosophyllum lusitanicum cultivation?
Read more >
Untitled
Tighten elastic slingback, Milo designs furniture, Witcher polish vs english! ... Luther ingram to the other man, Water for public good, Thorplands chemist ......
Read more >
Untitled
Chevy truck 100 mpg, Halla and jungkook moments, Chinese ivory chess pieces, Marvel comics squadron supreme, Tlx-03210b, Deep house chart mix, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found