question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Appetite for PR to accommodate `filters` with strings and `in` operator?

See original GitHub issue

When using ParquetFile() or read_parquet(), there is a use case for filters= with sets of strings as the values in the filters, particularly using the in or equality operators. For example:

filters = [('planet','in','jedha'), ('planet','in','scarif')]

This isn’t currently accommodated. Instead, as seen in fastparquet.api.filter_val(), there is a lexicographic comparison involving max/min operators. (There is also no warning for this behavior, and this use of filters is not proscribed by the docs.)

Can I make a PR to modify filter_val() for this use case?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Aug 23, 2018

The best thing to do would be to look at similar tests and copy the style, e.g., https://github.com/dask/fastparquet/blob/master/fastparquet/test/test_api.py#L378 The only magic here is tempdir (the function argument), which automatically creates a space to put temporary files, and clears them up when the test is done.

You will want to fork this repo to your github (button in top-right) and set it as a remote url (git remote add), before putting changes in a new branch locally and pushing to the new remote. You then create the PR from your copy of the repo on github, and all the tests will run automatically on TravicCI.

0reactions
martindurantcommented, Aug 24, 2018

You may include fresh parquet data files, if most convenient, so long as they are small. You will see that there are a number of binary files in the repo already, under test-data/.

However, most tests generate their test data files on-the-fly in the given temporary directory. If possible, that would be preferable - it depends on whether fastparquet can conveniently produce the type of output you want to read.

(note: row_group_offsets is documented here in the docstring)

Before starting, you would do well to write a function that shows that the current implementation for "in" doesn’t do what you think it should. Perhaps it’s better to fix than than to introduce a new operator?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Can you filter (in search bar) using contains operator for ...
I want to filter a search by looking for three possible strings within the values of a field. This works: fieldname contains 'string1', ......
Read more >
How to use filter operators? - Knowledge Base - SellerLegend
In other words, a filter operator specifies the method of comparison to be used when comparing fields with the value entered in the...
Read more >
Operators used in filters
Operators used in filters ; equals or after. The value of the field is the same as or greater than the specified value....
Read more >
Logs Query Options - REST API Guide Overview
Field Name Field Type Supported Operators adf Boolean eq,ne significant Integer eq,lt,le,gt,ge,ne significance String eq,sw,ne,co,nc
Read more >
Filter Operators Reference - Salesforce Help
Operators specify how filter criteria relate to each other. Refer to this list of filter operators when setting filters on list views, reports,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found