Appetite for PR to accommodate `filters` with strings and `in` operator?
See original GitHub issueWhen using ParquetFile()
or read_parquet()
, there is a use case for filters=
with sets of strings as the values in the filters
, particularly using the in
or equality operators. For example:
filters = [('planet','in','jedha'), ('planet','in','scarif')]
This isn’t currently accommodated. Instead, as seen in fastparquet.api.filter_val()
, there is a lexicographic comparison involving max
/min
operators. (There is also no warning for this behavior, and this use of filters is not proscribed by the docs.)
Can I make a PR to modify filter_val()
for this use case?
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Can you filter (in search bar) using contains operator for ...
I want to filter a search by looking for three possible strings within the values of a field. This works: fieldname contains 'string1', ......
Read more >How to use filter operators? - Knowledge Base - SellerLegend
In other words, a filter operator specifies the method of comparison to be used when comparing fields with the value entered in the...
Read more >Operators used in filters
Operators used in filters ; equals or after. The value of the field is the same as or greater than the specified value....
Read more >Logs Query Options - REST API Guide Overview
Field Name Field Type Supported Operators
adf Boolean eq,ne
significant Integer eq,lt,le,gt,ge,ne
significance String eq,sw,ne,co,nc
Read more >Filter Operators Reference - Salesforce Help
Operators specify how filter criteria relate to each other. Refer to this list of filter operators when setting filters on list views, reports,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The best thing to do would be to look at similar tests and copy the style, e.g., https://github.com/dask/fastparquet/blob/master/fastparquet/test/test_api.py#L378 The only magic here is
tempdir
(the function argument), which automatically creates a space to put temporary files, and clears them up when the test is done.You will want to fork this repo to your github (button in top-right) and set it as a remote url (
git remote add
), before putting changes in a new branch locally and pushing to the new remote. You then create the PR from your copy of the repo on github, and all the tests will run automatically on TravicCI.You may include fresh parquet data files, if most convenient, so long as they are small. You will see that there are a number of binary files in the repo already, under test-data/.
However, most tests generate their test data files on-the-fly in the given temporary directory. If possible, that would be preferable - it depends on whether fastparquet can conveniently produce the type of output you want to read.
(note:
row_group_offsets
is documented here in the docstring)Before starting, you would do well to write a function that shows that the current implementation for
"in"
doesn’t do what you think it should. Perhaps it’s better to fix than than to introduce a new operator?