BUG: Indexes still include values that have been deleted
See original GitHub issueUsing pandas 0.10. If we create a Dataframe with a multi-index, then delete all the rows with value X, we’d expect the index to no longer show value X. But it does. Note the apparent inconsistency between “index” and “index.levels” – one shows the values have been deleted but the other doesn’t.
import pandas
x = pandas.DataFrame([['deleteMe',1, 9],['keepMe',2, 9],['keepMeToo',3, 9]], columns=['first','second', 'third'])
x = x.set_index(['first','second'], drop=False)
x = x[x['first'] != 'deleteMe'] #Chop off all the 'deleteMe' rows
print x.index #Good: Index no longer has any rows with 'deleteMe'. But....
print x.index.levels #Bad: index still shows the "deleteMe" values are there. But why? We deleted them.
x.groupby(level='first').sum() #Bad: it's creating a dummy row for the rows we deleted!
We don’t want the deleted values to show up in that groupby. Can we eliminate them?
Issue Analytics
- State:
- Created 11 years ago
- Comments:34 (24 by maintainers)
Top Results From Across the Web
If we delete a document.Does,the data stored in index data ...
So,know I have a doubt that when we delete a document,does the value stored for the field in internal index data structure is...
Read more >Optimize index maintenance to improve query performance ...
This article describes index maintenance concepts, and a recommended strategy to maintain indexes.
Read more >Indexes - Datadog Docs
Note: The deleted index will no longer accept new incoming logs. The logs in the deleted index are no longer available for querying....
Read more >Manage search indexes | BigQuery - Google Cloud
When you no longer need a search index or want to change which columns are indexed on a table, you can delete the...
Read more >Manage indexes in Cloud Firestore - Firebase - Google
If they're still building, the Firebase console includes a building status bar. Remove indexes. To delete an index: Go to the Cloud Firestore...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think this can be closed: the default behavior is as intended, and the method
MultiIndex.remove_unused_levels()
has been added as a simple fix for whoever doesn’t like the default behavior.The pandas API doesn’t fit in my head anymore. For reference
df.index.get_level_values
might be relevent for whatever use case this was a problem for. DOes the right thing.