[BUG-REPORT] to_pandas_df throws exception on filtered data frame on virtual column, with empty result
See original GitHub issueDescription to_pandas_df throws exception when invoked on empty filter. This happens only when there is a virtual column with a transformation (it will not happen if col1 is not modified see code below)
Software information
- Vaex version (
import vaex; vaex.__version__)
: {‘vaex’: ‘4.12.0’, ‘vaex-core’: ‘4.12.0’, ‘vaex-viz’: ‘0.5.3’, ‘vaex-hdf5’: ‘0.12.3’, ‘vaex-server’: ‘0.8.1’, ‘vaex-astro’: ‘0.9.1’, ‘vaex-jupyter’: ‘0.8.0’, ‘vaex-ml’: ‘0.18.0’} - Vaex was installed via: pip
- OS: Arch (similar behavior on Ubuntu)
Additional information Code to reproduce. First two prints work, third one throws exception.
import pandas as pd
import vaex as vx
df = vx.from_pandas(pd.DataFrame(data={'col1':['chr1','chr2'],'col2':[3,4]}))
print(df[df['col1']=='3'].extract().to_pandas_df()) # Works
df['col1'] = df.col1.astype('str').str.replace('^chr','',regex=True)
print(df[df['col1']=='3'].extract()) # Works
print(df[df['col1']=='3'].extract().to_pandas_df()) # Throws exception
results in: AssertionError for assert filter.sum() == expected_length, in dataset.py, line 1018, in slice
Issue Analytics
- State:
- Created a year ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
Empty dataframe when filtering - python - Stack Overflow
Details : the column 'PZAE' contains str starting and finishing by ' that's why you have to include them in the condition.
Read more >zero-row (result) filter throws an error · Issue #928 · vaexio/vaex
Description My original CSV data was converted to HDF5 before opening with vaex. All columns are data type of string.
Read more >Filtering Dask DataFrames with loc - Coiled.io
This post explains how to filter Dask DataFrames based on the DataFrame index and on column values using loc.
Read more >How to Filter Rows in Pandas: 6 Methods to Power Data ...
The output of executing this code and printing the result is below. Filtered DataFrame showing two rows with value 2 under column "a"...
Read more >Data filtering in Pandas - Towards Data Science
Filtering data from a data frame is one of the most common ... As shown above, the result is a DataFrame object containing...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ok, so, with an extract that dataframe is a bit in limbo, some things can work, depending if you use the virtual column or not. I think #2284 is the better way for now.
Did it work in the case of virtual columns (that are not materialized etc.). I don’t really see how it could have worked… since we do not know the dtype before evaluation (if I am not mistaken)