question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bug: invalid values returned by .first().over(w) or .last().over(w) when using the pandas backend

See original GitHub issue

Minimal reproducer:

import ibis
import pandas as pd

df = pd.DataFrame(
    {

        "g": ["a", "a", "a", "a", "a"],
        "x": [0, 1, 2, 3, 4],
        "y": [3, 2, 0, 1, 1],
    }
)
df.to_parquet("test.parquet")
t_pandas = ibis.pandas.connect({"t": df}).table("t")
t_duckdb = ibis.duckdb.connect().register("test.parquet", table_name="t")


def simple_window_ops(t):
    w = ibis.window(
        group_by=t.g,
        order_by=[t.x, t.y],
        preceding=1,
        following=0,
    )
    return t.mutate(
        x_first=t.x.first().over(w),
        x_last=t.x.last().over(w),
        y_first=t.y.first().over(w),
        y_last=t.y.last().over(w),
    )

Then pandas does not seem to take the preceding and following window boundaries into account:

>>> print(simple_window_ops(t_pandas).execute())
   g  x  y  x_first  x_last  y_first  y_last
0  a  0  3        0       4        3       1
1  a  1  2        0       4        3       1
2  a  2  0        0       4        3       1
3  a  3  1        0       4        3       1
4  a  4  1        0       4        3       1

while duckdb works as expected:

>>> print(simple_window_ops(t_duckdb).execute())
   g  x  y  x_first  x_last  y_first  y_last
0  a  0  3        0       0        3       3
1  a  1  2        0       1        3       2
2  a  2  0        1       2        2       0
3  a  3  1        2       3        0       1
4  a  4  1        3       4        1       1

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
saulpwcommented, Oct 19, 2022

Thanks for reporting this, @ogrisel. This is definitely two bugs. We’ll try to get pandas to return valid results, or at the very least, raise a meaningful error if it can’t.

0reactions
ogriselcommented, Nov 10, 2022

BTW, what about the second problem documented in https://github.com/ibis-project/ibis/issues/4676#issuecomment-1283756388

Do you want me to open a dedicated issue or both problems are likely to be solved by the same PR?

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.DataFrame.query — pandas 1.5.2 documentation
Query the columns of a DataFrame with a boolean expression. ... See the documentation for eval() for complete details on the keyword arguments...
Read more >
Working with Missing Data in Pandas - GeeksforGeeks
Checking for missing values using isnull()​​ In order to check null values in Pandas DataFrame, we use isnull() function this function return ......
Read more >
How to Filter Rows in Pandas: 6 Methods to Power Data ...
Filtering rows in pandas removes extraneous or incorrect data so you are left with the cleanest data set available. You can filter by...
Read more >
PySpark Usage Guide for Pandas with Apache Arrow
If an error occurs during createDataFrame() , Spark will fall back to create the DataFrame without Arrow. Pandas UDFs (a.k.a. Vectorized UDFs). Pandas...
Read more >
5 PL/SQL Collections and Records - Oracle Help Center
Associative arrays help you represent data sets of arbitrary size, with fast ... CourseList() , which returns a nested table containing those elements:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found