Feature request: pandas.Series.query()
See original GitHub issueI would like to request a pandas.Series.query()
method that works identically to pandas.DataFrame.query()
.
I have a large multi-indexed series that I would like to split into training and testing data for an ML research project. A minimal working example would be:
import pandas as pd
years = range(2002, 2018)
fields = range(1, 5)
index = pd.MultiIndex.from_product(
[years, fields], names=['year', 'field'])
series = pd.Series(index=index)
What I would like to be able to do is split this series into 2010 data and not 2010 data. Accessing 2010 data is very easy:
test_data = series[2010]
Accessing not 2010 data is very hard. This is the shortest method I’ve found so far:
train_data = series.to_frame().query('year != 2010')[0]
Since pandas.Series
doesn’t support query()
, I have to convert it to a DataFrame and then back into a Series. Is there any reason why pandas.Series
doesn’t support query
directly?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:8
- Comments:16 (12 by maintainers)
Top Results From Across the Web
pandas.DataFrame.query — pandas 1.5.2 documentation
Query the columns of a DataFrame with a boolean expression. Parameters. exprstr ... The query() method uses a slightly modified Python syntax by...
Read more >How to Use Pandas Query to Filter a DataFrame - Datagy
One of the great features of the Pandas .query() method is that you can also filter based on values being passed into a...
Read more >Python | Pandas Series - GeeksforGeeks
sub() etc.. Code #1: # importing pandas module import pandas as pd # creating a series data = pd.Series([5, 2, 3, ...
Read more >Query Pandas DataFrame with SQL | Towards Data Science
The sqldf method is used to query the Dataframes and it requires 2 inputs: The SQL query string; globals() or locals() function. A...
Read more >Pandas query function not working with spaces in column ...
DataFrame.query() and DataFrame.eval() now supports quoting column names with backticks to refer to names with spaces (GH6508).
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I agree that the masking solution works just fine, but I do still think that having Series and DataFrame support different methods is quite non-intuitive. Especially coming from numpy, where there is no distinction between 1D and 2D arrays, they all support identical operations.
Perhaps a better solution to your problem would be to use a mask based off of the index values, so: