Query builder / providing SQL clauses to the dialect
See original GitHub issueOverview
I’m working on a very similar project, which is a DEVT framework, like frictionless-py
, but with focus on open government data.
Instead of reinventing the wheel, I decided to do an experiment integrating frictionless-py
. So far it worked pretty well with “Describe” part, but I’m not sure if frictionless-py
has a good support for data filtering.
As I understand currently filtering is provided by transformations, for example there is a Filter Rows. I don’t fully understand, how that works, what is <formula>
and more important how it is implemented under the hood.
I will have to deal with very large tables, so it is important to know, that filtering happens at SQL database level, not on python level. If filtering will be done on Python, then it will take forever to complete.
Also, same thing applies to joins. Are joins performed on SQL database level, or frictionless-py
loads everything into memory and does joins on Python side?
Please preserve this line to notify @roll (lead of this repository)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:5 (4 by maintainers)
Top GitHub Comments
very much. Started playing around, looks like the basics should be easy to add within the current structure. Will give it a try and do a PR.
@sirex Spinta is really interesting! (cc @lwinfree)
Under the hood of all the transformation, we use a battle-tested PETL framework - https://petl.readthedocs.io/en/stable/ - https://frictionlessdata.github.io/frictionless-py/docs/guides/transforming-data. Usually, it guarantees that memory is treated properly e.g. for big data it will buffer it using the disc etc. More about transform in Frictionless - https://frictionlessdata.github.io/frictionless-py/docs/guides/transforming-data
Another question, that’s true that performance might be an issue for very large data if we compare it to pure SQL. ATM I see two main options to resolve it if you need a declarative was on the frictionless level:
If you don’t need it to be fully declarative we can just add some sqlalchemy options like built query as an option to SQL Storage