Using webargs to parse more complex query arguments (string or comparison operators,...)
See original GitHub issueI would like my API to offer basic query features, like not only get item by ID but also using operators on item attributes:
GET /resource/?name=chuck&surname__contains=orris&age__lt=69
Proof of concept
Here is what I’ve done so far. I did not modify webargs code. I’m only subclassing in my own application.
I’ve been searching around and didn’t find a universally accepted norm specifying such a query language. In my implementation, I’m using the double underscore syntax and a subset of operators from MongoEngine.
Basically, I have two families of operators, some operating on numbers, others on string.
NUMBER_OPERATORS = ('ne', 'gt', 'gte', 'lt', 'lte')
STRING_OPERATORS = (
'contains', 'icontains', 'startswith', 'istartswith',
'endswith', 'iendswith', 'iexact'
)
QUERY_OPERATORS = {
'number': NUMBER_OPERATORS,
'string': STRING_OPERATORS,
}
Those lists could probably be extended. Operators meanings should be easy to grap. Examples:
age__lt=69
means I expect the API to return records with attribute age lower than 69surname__contains=orris
means attribute age contains string “orris”
To let webargs parse such parameters, I need to modify the Schema
so that for chosen fields, the Marshmallow field is duplicated (deepcopied) into needed variants. For instance, the field age
is duplicated into age__lt
, age__gt
,…
This is an opt-in feature. For each field I want to expose this way, I need to specify in Meta
which operators category it should use (currently only two categories: number
and string
). Auto-detection is complicated as I don’t know how I would handle custom fields, and there may be other categories some day.
In the Schema, I add:
class Meta:
fields_filters = {
'name': ('string',),
'surname': ('string',),
'age': ('number',),
}
For each field, I’m passing a list of categories as one could imagine several categories applying to a field, but currently, I have no example of field that would use both number and string operators.
And the “magic” takes place here:
class SchemaOpts(ma.SchemaOpts):
def __init__(self, meta):
super(SchemaOpts, self).__init__(meta)
# Add a new meta field to pass the list of filters
self.fields_filters = getattr(meta, 'fields_filters', None)
class SchemaMeta(ma.schema.SchemaMeta):
"""Metaclass for `ModelSchema`."""
@classmethod
def get_declared_fields(mcs, klass, *args, **kwargs):
# Create empty dict using provided dict_class
declared_fields = kwargs.get('dict_class', dict)()
# Add base fields
base_fields = super(SchemaMeta, mcs).get_declared_fields(
klass, *args, **kwargs
)
declared_fields.update(base_fields)
# Get allowed filters from Meta and create filters
opts = klass.opts
fields_filters = getattr(opts, 'fields_filters', None)
if fields_filters:
filter_fields = {}
for field_name, field_filters in fields_filters.items():
field = base_fields.get(field_name, None)
if field:
for filter_category in field_filters:
for operator in QUERY_OPERATORS.get(
filter_category, ()):
filter_fields[
'{}__{}'.format(field_name, operator)
] = deepcopy(field)
declared_fields.update(filter_fields)
return declared_fields
class QueryArgsSchema(ma.compat.with_metaclass(SchemaMeta, ma.Schema)):
OPTIONS_CLASS = SchemaOpts
And finally, I use the Schema to parse the query arguments:
@use_args(ObjectSchema)
def get(self, args):
...
Questions
This raises a few points.
- Is there some sort of convention I missed when searching for a query language?
- Is this out-of-scope for webargs or could it be a useful enhancement?
- Is there no need for that? I’ve been investigating both flask-retful and marshmallow/webargs/… ecosystems, along with @frol’s invaluable flask-restplus-server-example and saw nothing close to this, so I’m thinking maybe people just don’t do that. Or maybe they only expose a few filters, and they do it in specific routes.
- Should this be in @touilleMan’s marshmallow-mongoengine? I do use this library, but although the query language is inspired^Wshamelessly copied from MongoEngine (which allows me to pass the query arguments straight into the QuerySet filters…), the whole thing has no dependency on MongoEngine and this should be a generic feature.
- My
QueryArgsSchema
also has sort and pagination fields, but I didn’t expose them here as this needs nothing fancy on Marshmallow’s side. Maybe a real “query parameters” feature would integrate these as well.
Feedback greatly appreciated. It seems to work right now, but on the long run, I might discover it was poorly designed from the start.
Thanks.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:3
- Comments:13 (9 by maintainers)
Top GitHub Comments
Hi.
No pb. Here is fine.
Unless I’m missing something, what you’re seeing is normal.
post_load
is called after the Schema loads the inputs data in webargs, andpost_dump
after the Shema dumps the return data in the view (I supposejsonify
is a method you added to do dump + jsonify ?).When doing
@use_kwargs(deliveries_schema.fields)
, you don’t use the Schema, only its fields, so it’s no surprisepost_load
is not called.Yes.
When doing REST, in the general case, I use the model Schema (
DeliverySchema
) to load POST/PUT request payload and dump GET/POST/PUT response payload (location = json/body). And I use a specific handmade schema (DeliveryQueryArgsSchema
) to parse GET arguments (location = querystring).In this handmade Schema, I add manually the fields for the parameters I want to use. It smells like duplication, as if I want to allow filtering by any field, I need to add all the fields manually, but in practice I came to realize that I never wanted that. For instance, sorting by string fields such as name, address or so was not really realistic. IDs are meant to be used as filters. Names could be used by complex pattern searching features for partial string matching, but as plain filters, they kinda suck as the user can’t be expected to enter the real name with no typo, the proper case and accents, etc.), otherwise there’d be no need for an ID (this is totally peremptory and arguable, MongoDB typically uses plain strings for ID in their tutorials, but you get the point). Floats are a better example. No point filtering by float since float equality is almost impossible technically and makes no sense in real life. You’d rather use a gt/lt interval
In a project, we also crafted a sort field that deserializes inputs such as
name,-age
into a query sorting by name and reversed age.There could be ways to avoid doing everything manually. You could create a function that would generate those
DeliveryQueryArgsSchema
the clever way, givenDeliverySchema
.It could use some fields as filters, maybe only some types of fields, or maybe only those to which you would have passed
use_as_filter=True
in the Schema declaration.It could add _lt and _gt and such to all number and datetime fields. Note there may be a lot of them, with lte, gte, etc.
And so on.
Also, note @frol’s advice above:
If you need complex querying (I mean non-trivial, I totally agree that what you’re talking about it not complex API and sounds rather like everyone’s needs), maybe you’re better-off using a String field in webargs, document it as “my custom query string” in the docs (assuming you’re documenting your API with a tool like marshmallow’s apispec) and do the query parsing in a custom function.
Beware of the quest for the silver bullet. There may be no point having a frameworks that creates an API allowing the user to do everything on the data. Sometimes, there are things you don’t want him to do, so you need to be able to add exceptions, like removing part of the automatically generated fields. And some other things are harmless but useless. Maybe you only need 100% of it.
YMMV. In our case, we realized that we didn’t need that many options in the query, so we removed the PQL code (see above) from our application and just added the fields manually for the filters we wanted to provide the user. Overall, there were not that many fields to add.
Hope this helps…
Well, I like to have new_item in a single variable. If I used
use_kwargs
, the view func would look like this, I suppose:And what if the object has an attribute named
item_id
? Alright, this shouldn’t happen.The object attributes would be mixed with the ID in the function signature, which doesn’t convey the intent.
Well that’s my understanding. I have no strong opinion about that. And maybe I’m just wrong about
use_kwargs
.