question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MWE

from __future__ import print_function

import pandas as pd
import numpy as np

print("Panda version:", pd.__version__)
print("+++++++++++++++++++++++++++++++++++")
print(pd.show_versions())
print("+++++++++++++++++++++++++++++++++++")

####################################################
# Config
####################################################

pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

####################################################
# Read data
####################################################

file = "/tmp/california_housing_train.csv"
if(np.DataSource().exists(file)):
	dataset = file
else:
	dataset = "https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv"

sep=","
california_housing_dataframe = pd.read_csv(dataset, sep)

####################################################
# Reorder
####################################################

newOrder = np.random.permutation(california_housing_dataframe.index)
california_housing_dataframe_reordered = california_housing_dataframe.reindex(newOrder)

####################################################
# Merge and show diff of the heads
####################################################

# Let's take the heads of both datasetstand compare them
# They should be different in (mostly) all elements 

head1 = california_housing_dataframe.head(10)
head2 = california_housing_dataframe_reordered.head(10)

# @see https://stackoverflow.com/a/36893675/605890
merged = head1.merge(head2, indicator=True, how='outer')
print(merged)

Run on colab

I created a colab for the MWE, which is based on pandas 0.22.0:

https://colab.research.google.com/drive/19uDE_H4AtpLaEL6INrRrDMXkdANsNr69#scrollTo=CzxuGppV26Rt

If you run it, you see at the output (if non is doubled randomly):

  • 10x left_only
  • 10x right_only

Run with docker containers

Now, run the same MWE (located under /tmp/tf/Bug.py) in a two different docker containers, which uses pandas 0.23.4,:

Both return:

  • 10x both

This means, both heads are the same, which means that reindex does not have any effect.

Python docker container (python 3.6.6)

docker run --rm -it -v /tmp/tf/:/tmp/ python:3.6.6 /bin/bash -c "pip install pandas && python /tmp/Bug.py"

tensorflow docker container (tensorsflow 1.11.0)

docker run --rm -it -v /tmp/tf/:/tmp/ tensorflow/tensorflow:1.11.0-py3 python /tmp/Bug.py 

TLDR

The following code does not have any effect in pandas 0.23.4:

california_housing_dataframe_reordered = california_housing_dataframe.reindex(newOrder)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jschendelcommented, Oct 12, 2018

This looks to be the root cause of the issue: https://github.com/numpy/numpy/issues/11975, which should be fixed in the next numpy release (1.15.3).

In [1]: import numpy as np; np.__version__
Out[1]: '1.15.2'

In [2]: import pandas as pd; pd.__version__
Out[2]: '0.23.4'

In [3]: df = pd.DataFrame({'a': range(5), 'b': np.arange(0, 0.5, 0.1)})

In [4]: df
Out[4]:
   a    b
0  0  0.0
1  1  0.1
2  2  0.2
3  3  0.3
4  4  0.4

In [5]: new_order = np.random.permutation(df.index)

In [6]: df
Out[6]:
   a    b
1  0  0.0
2  1  0.1
0  2  0.2
3  3  0.3
4  4  0.4

The workaround of using df.index.values as suggested in the issue appears to work:

In [7]: df = pd.DataFrame({'a': range(5), 'b': np.arange(0, 0.5, 0.1)})

In [8]: df
Out[8]:
   a    b
0  0  0.0
1  1  0.1
2  2  0.2
3  3  0.3
4  4  0.4

In [9]: new_order = np.random.permutation(df.index.values)

In [10]: df
Out[10]:
   a    b
0  0  0.0
1  1  0.1
2  2  0.2
3  3  0.3
4  4  0.4

In [11]: df.reindex(new_order)
Out[11]:
   a    b
0  0  0.0
4  4  0.4
3  3  0.3
2  2  0.2
1  1  0.1
1reaction
boldtcommented, Oct 12, 2018

My code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': range(5), 'b': np.arange(0, 0.5, 0.1)})
print(df)

print('++++++++++++++')

new_order = np.random.permutation(df.index)
print(df.reindex(new_order))

and the result from the tensorflow container:

   a    b
0  0  0.0
1  1  0.1
2  2  0.2
3  3  0.3
4  4  0.4
++++++++++++++
   a    b
4  0  0.0
3  1  0.1
1  2  0.2
2  3  0.3
0  4  0.4

But it should be the following for the reindex, right?

   a    b
4  4  0.4
3  3  0.3
1  1  0.1
2  2  0.2
0  0  0.0
Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Reindexing only level of a MultiIndex dataframe ...
Reindexing only level of a MultiIndex dataframe, reindex() broken? · 1. So it appears that the level argument of reindex works to reindex...
Read more >
Background reindex can be triggered with broken Jira index
Summary. Background reindex is not supposed to fix the broken index. It shouldn't be possible to trigger it when the index is broken....
Read more >
Documentation: 15: REINDEX - PostgreSQL
An index has become corrupted, and no longer contains valid data. Although in theory this should never happen, in practice indexes can become...
Read more >
Rebuild the Spotlight index on your Mac - Apple Support
Spotlight will reindex the contents of the disk or folder. This can take some time, depending on the amount of information being indexed....
Read more >
is locked by another reindex process. Skipping. - Magento 2
During full reindex magento returns error message: index is locked by another ... status of all indexes and run reindex if some index...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found