question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEATURE-REQUEST] Opposite String Startswith Search in VAEX Dataframe

See original GitHub issue

Description Say we have a string to match in a database,

We can accomplish that by a simple select and evaluate in VAEX:

df_vaex.select(df_vaex["name"].str.startswith(search_string))

However, this searches for search_string in Database Entries rather than Database Entries in search_string.

Can this be performed using Vaex?

Is your feature request related to a problem? Please describe.

search_string1 = "ASTHA MAT" 
search_string2 = "ASTHA MATERIALS INDIA" 

df_vaex.select(df_vaex["name"].str.startswith(search_string1))
df_vaex.evaluate(df_vaex["name"], selection=True)
# ASTHA MATERIALS

df_vaex.select(search_string2.startswith(df_vaex["name"]))
# TypeError: startswith first arg must be str or a tuple of str, not Expression

Additional context Would be great to have a reverse search technology in a vectorized fashion for quick searching as in pandas.Series.isin!

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:18 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
Ben-Epsteincommented, Jul 13, 2022

@khanfarhan10 I would think regex is your best bet. If you want something a bit more specific you could do a registered function

import vaex
import pyarrow as pa

dict_data = dict(name=["ASTHA MATERIALS" , "LOREM IPSUM" ], locationID=[5454,6767]) # with other cols as well

df = vaex.from_dict(dict_data)
search_string = "ASTHA MATERIALS INDIA"



@vaex.register_function()
def str_contains_col(col_vals, str_search):
    return pa.array([str_search.startswith(v) for v in col_vals.to_pylist()])

df.func.str_contains_col(df["name"], search_string)
2reactions
maartenbreddelscommented, Jun 24, 2022

looks that way… although our apply should be parallel (multiprocessing), but it’s a good point, I’ll see if we can have the reverse without too many changes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

vaex.functions — vaex 4.0.0-dev.0 documentation
If `df_accessor` is given, it is added as a method to that dataframe accessor (see e.g. vaex/geo.py) Example: >>> import vaex >>> df...
Read more >
The Garden of Forking Paths
I forked a few kernels, but could not find the ④ file page, how can I list all kernels I forked ? When...
Read more >
Agile Data Preparation Workflows Made Easy with Pandas
Optimus is an opinionated python library to easily load, process, plot and create ML models that run over pandas, Dask, cuDF, dask-cuDF, Vaex...
Read more >
How to filter a vaex dataset by a list of numbers/categories
Here are some solutions for that. df.query("category in @filter_category_list") df[df['category'].apply(lambda x: x in filter_category_list)]
Read more >
Delft Students on Software Architecture – DESOSA 2019
This module contains the additional non-Java files that are used in combination ... string constant - error prone in terms of assigning string...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found