question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: pandas.Series.query()

See original GitHub issue

I would like to request a pandas.Series.query() method that works identically to pandas.DataFrame.query().

I have a large multi-indexed series that I would like to split into training and testing data for an ML research project. A minimal working example would be:

import pandas as pd

years = range(2002, 2018)
fields = range(1, 5)

index = pd.MultiIndex.from_product(
    [years, fields], names=['year', 'field'])

series = pd.Series(index=index)

What I would like to be able to do is split this series into 2010 data and not 2010 data. Accessing 2010 data is very easy:

test_data = series[2010]

Accessing not 2010 data is very hard. This is the shortest method I’ve found so far:

train_data = series.to_frame().query('year != 2010')[0]

Since pandas.Series doesn’t support query(), I have to convert it to a DataFrame and then back into a Series. Is there any reason why pandas.Series doesn’t support query directly?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:8
  • Comments:16 (12 by maintainers)

github_iconTop GitHub Comments

3reactions
adamjstewartcommented, Aug 14, 2018

I agree that the masking solution works just fine, but I do still think that having Series and DataFrame support different methods is quite non-intuitive. Especially coming from numpy, where there is no distinction between 1D and 2D arrays, they all support identical operations.

3reactions
WillAydcommented, Aug 14, 2018

Perhaps a better solution to your problem would be to use a mask based off of the index values, so:

mask = index.get_level_values('year') == 2010
train = series[mask]
test = series[~mask]
Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.DataFrame.query — pandas 1.5.2 documentation
Query the columns of a DataFrame with a boolean expression. Parameters. exprstr ... The query() method uses a slightly modified Python syntax by...
Read more >
How to Use Pandas Query to Filter a DataFrame - Datagy
One of the great features of the Pandas .query() method is that you can also filter based on values being passed into a...
Read more >
Python | Pandas Series - GeeksforGeeks
sub() etc.. Code #1: # importing pandas module import pandas as pd # creating a series data = pd.Series([5, 2, 3, ...
Read more >
Query Pandas DataFrame with SQL | Towards Data Science
The sqldf method is used to query the Dataframes and it requires 2 inputs: The SQL query string; globals() or locals() function. A...
Read more >
Pandas query function not working with spaces in column ...
DataFrame.query() and DataFrame.eval() now supports quoting column names with backticks to refer to names with spaces (GH6508).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found