Reindex broken
See original GitHub issueMWE
from __future__ import print_function
import pandas as pd
import numpy as np
print("Panda version:", pd.__version__)
print("+++++++++++++++++++++++++++++++++++")
print(pd.show_versions())
print("+++++++++++++++++++++++++++++++++++")
####################################################
# Config
####################################################
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format
####################################################
# Read data
####################################################
file = "/tmp/california_housing_train.csv"
if(np.DataSource().exists(file)):
dataset = file
else:
dataset = "https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv"
sep=","
california_housing_dataframe = pd.read_csv(dataset, sep)
####################################################
# Reorder
####################################################
newOrder = np.random.permutation(california_housing_dataframe.index)
california_housing_dataframe_reordered = california_housing_dataframe.reindex(newOrder)
####################################################
# Merge and show diff of the heads
####################################################
# Let's take the heads of both datasetstand compare them
# They should be different in (mostly) all elements
head1 = california_housing_dataframe.head(10)
head2 = california_housing_dataframe_reordered.head(10)
# @see https://stackoverflow.com/a/36893675/605890
merged = head1.merge(head2, indicator=True, how='outer')
print(merged)
Run on colab
I created a colab for the MWE, which is based on pandas 0.22.0:
https://colab.research.google.com/drive/19uDE_H4AtpLaEL6INrRrDMXkdANsNr69#scrollTo=CzxuGppV26Rt
If you run it, you see at the output (if non is doubled randomly):
- 10x
left_only
- 10x
right_only
Run with docker containers
Now, run the same MWE (located under /tmp/tf/Bug.py
) in a two different docker containers, which uses pandas 0.23.4,:
Both return:
- 10x
both
This means, both heads are the same, which means that reindex
does not have any effect.
Python docker container (python 3.6.6)
docker run --rm -it -v /tmp/tf/:/tmp/ python:3.6.6 /bin/bash -c "pip install pandas && python /tmp/Bug.py"
tensorflow docker container (tensorsflow 1.11.0)
docker run --rm -it -v /tmp/tf/:/tmp/ tensorflow/tensorflow:1.11.0-py3 python /tmp/Bug.py
TLDR
The following code does not have any effect in pandas 0.23.4:
california_housing_dataframe_reordered = california_housing_dataframe.reindex(newOrder)
Issue Analytics
- State:
- Created 5 years ago
- Comments:16 (8 by maintainers)
Top Results From Across the Web
python - Reindexing only level of a MultiIndex dataframe ...
Reindexing only level of a MultiIndex dataframe, reindex() broken? · 1. So it appears that the level argument of reindex works to reindex...
Read more >Background reindex can be triggered with broken Jira index
Summary. Background reindex is not supposed to fix the broken index. It shouldn't be possible to trigger it when the index is broken....
Read more >Documentation: 15: REINDEX - PostgreSQL
An index has become corrupted, and no longer contains valid data. Although in theory this should never happen, in practice indexes can become...
Read more >Rebuild the Spotlight index on your Mac - Apple Support
Spotlight will reindex the contents of the disk or folder. This can take some time, depending on the amount of information being indexed....
Read more >is locked by another reindex process. Skipping. - Magento 2
During full reindex magento returns error message: index is locked by another ... status of all indexes and run reindex if some index...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This looks to be the root cause of the issue: https://github.com/numpy/numpy/issues/11975, which should be fixed in the next numpy release (1.15.3).
The workaround of using
df.index.values
as suggested in the issue appears to work:My code:
and the result from the tensorflow container:
But it should be the following for the reindex, right?