question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG-REPORT] to_pandas_df throws exception on filtered data frame on virtual column, with empty result

See original GitHub issue

Description to_pandas_df throws exception when invoked on empty filter. This happens only when there is a virtual column with a transformation (it will not happen if col1 is not modified see code below)

Software information

  • Vaex version (import vaex; vaex.__version__): {‘vaex’: ‘4.12.0’, ‘vaex-core’: ‘4.12.0’, ‘vaex-viz’: ‘0.5.3’, ‘vaex-hdf5’: ‘0.12.3’, ‘vaex-server’: ‘0.8.1’, ‘vaex-astro’: ‘0.9.1’, ‘vaex-jupyter’: ‘0.8.0’, ‘vaex-ml’: ‘0.18.0’}
  • Vaex was installed via: pip
  • OS: Arch (similar behavior on Ubuntu)

Additional information Code to reproduce. First two prints work, third one throws exception.

import pandas as pd
import vaex as vx

df = vx.from_pandas(pd.DataFrame(data={'col1':['chr1','chr2'],'col2':[3,4]}))
print(df[df['col1']=='3'].extract().to_pandas_df()) # Works
df['col1'] = df.col1.astype('str').str.replace('^chr','',regex=True)
print(df[df['col1']=='3'].extract()) # Works
print(df[df['col1']=='3'].extract().to_pandas_df()) # Throws exception

results in: AssertionError for assert filter.sum() == expected_length, in dataset.py, line 1018, in slice

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
maartenbreddelscommented, Dec 2, 2022

Ok, so, with an extract that dataframe is a bit in limbo, some things can work, depending if you use the virtual column or not. I think #2284 is the better way for now.

0reactions
JovanVeljanoskicommented, Nov 29, 2022

Did it work in the case of virtual columns (that are not materialized etc.). I don’t really see how it could have worked… since we do not know the dtype before evaluation (if I am not mistaken)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Empty dataframe when filtering - python - Stack Overflow
Details : the column 'PZAE' contains str starting and finishing by ' that's why you have to include them in the condition.
Read more >
zero-row (result) filter throws an error · Issue #928 · vaexio/vaex
Description My original CSV data was converted to HDF5 before opening with vaex. All columns are data type of string.
Read more >
Filtering Dask DataFrames with loc - Coiled.io
This post explains how to filter Dask DataFrames based on the DataFrame index and on column values using loc.
Read more >
How to Filter Rows in Pandas: 6 Methods to Power Data ...
The output of executing this code and printing the result is below. Filtered DataFrame showing two rows with value 2 under column "a"...
Read more >
Data filtering in Pandas - Towards Data Science
Filtering data from a data frame is one of the most common ... As shown above, the result is a DataFrame object containing...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found