question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using webargs to parse more complex query arguments (string or comparison operators,...)

See original GitHub issue

I would like my API to offer basic query features, like not only get item by ID but also using operators on item attributes:

GET /resource/?name=chuck&surname__contains=orris&age__lt=69

Proof of concept

Here is what I’ve done so far. I did not modify webargs code. I’m only subclassing in my own application.

I’ve been searching around and didn’t find a universally accepted norm specifying such a query language. In my implementation, I’m using the double underscore syntax and a subset of operators from MongoEngine.

Basically, I have two families of operators, some operating on numbers, others on string.

NUMBER_OPERATORS = ('ne', 'gt', 'gte', 'lt', 'lte')

STRING_OPERATORS = (
    'contains', 'icontains', 'startswith', 'istartswith',
    'endswith', 'iendswith', 'iexact'
)

QUERY_OPERATORS = {
    'number': NUMBER_OPERATORS,
    'string': STRING_OPERATORS,
}

Those lists could probably be extended. Operators meanings should be easy to grap. Examples:

  • age__lt=69 means I expect the API to return records with attribute age lower than 69
  • surname__contains=orris means attribute age contains string “orris”

To let webargs parse such parameters, I need to modify the Schema so that for chosen fields, the Marshmallow field is duplicated (deepcopied) into needed variants. For instance, the field age is duplicated into age__lt, age__gt,…

This is an opt-in feature. For each field I want to expose this way, I need to specify in Meta which operators category it should use (currently only two categories: number and string). Auto-detection is complicated as I don’t know how I would handle custom fields, and there may be other categories some day.

In the Schema, I add:

    class Meta:
        fields_filters = {
            'name': ('string',),
            'surname': ('string',),
            'age': ('number',),
        }

For each field, I’m passing a list of categories as one could imagine several categories applying to a field, but currently, I have no example of field that would use both number and string operators.

And the “magic” takes place here:

class SchemaOpts(ma.SchemaOpts):
    def __init__(self, meta):
        super(SchemaOpts, self).__init__(meta)
        # Add a new meta field to pass the list of filters
        self.fields_filters = getattr(meta, 'fields_filters', None)


class SchemaMeta(ma.schema.SchemaMeta):
    """Metaclass for `ModelSchema`."""

    @classmethod
    def get_declared_fields(mcs, klass, *args, **kwargs):

        # Create empty dict using provided dict_class
        declared_fields = kwargs.get('dict_class', dict)()

        # Add base fields
        base_fields = super(SchemaMeta, mcs).get_declared_fields(
            klass, *args, **kwargs
        )
        declared_fields.update(base_fields)

        # Get allowed filters from Meta and create filters
        opts = klass.opts
        fields_filters = getattr(opts, 'fields_filters', None)

        if fields_filters:
            filter_fields = {}
            for field_name, field_filters in fields_filters.items():
                field = base_fields.get(field_name, None)
                if field:
                    for filter_category in field_filters:
                        for operator in QUERY_OPERATORS.get(
                                filter_category, ()):
                            filter_fields[
                                '{}__{}'.format(field_name, operator)
                            ] = deepcopy(field)
            declared_fields.update(filter_fields)

        return declared_fields


class QueryArgsSchema(ma.compat.with_metaclass(SchemaMeta, ma.Schema)):
    OPTIONS_CLASS = SchemaOpts

And finally, I use the Schema to parse the query arguments:

    @use_args(ObjectSchema)
    def get(self, args):
        ...

Questions

This raises a few points.

  • Is there some sort of convention I missed when searching for a query language?
  • Is this out-of-scope for webargs or could it be a useful enhancement?
  • Is there no need for that? I’ve been investigating both flask-retful and marshmallow/webargs/… ecosystems, along with @frol’s invaluable flask-restplus-server-example and saw nothing close to this, so I’m thinking maybe people just don’t do that. Or maybe they only expose a few filters, and they do it in specific routes.
  • Should this be in @touilleMan’s marshmallow-mongoengine? I do use this library, but although the query language is inspired^Wshamelessly copied from MongoEngine (which allows me to pass the query arguments straight into the QuerySet filters…), the whole thing has no dependency on MongoEngine and this should be a generic feature.
  • My QueryArgsSchema also has sort and pagination fields, but I didn’t expose them here as this needs nothing fancy on Marshmallow’s side. Maybe a real “query parameters” feature would integrate these as well.

Feedback greatly appreciated. It seems to work right now, but on the long run, I might discover it was poorly designed from the start.

Thanks.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:3
  • Comments:13 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
lafrechcommented, Jun 30, 2019

Hi.

unless you think another place is best for the discussion

No pb. Here is fine.

Unless I’m missing something, what you’re seeing is normal. post_load is called after the Schema loads the inputs data in webargs, and post_dump after the Shema dumps the return data in the view (I suppose jsonify is a method you added to do dump + jsonify ?).

When doing @use_kwargs(deliveries_schema.fields), you don’t use the Schema, only its fields, so it’s no surprise post_load is not called.

This worked, but it won’t work any more. Not surprising, as it’s a brittle design that naively assumes that I want to pass all query string parameters to filter_by as is.

Yes.

When doing REST, in the general case, I use the model Schema (DeliverySchema) to load POST/PUT request payload and dump GET/POST/PUT response payload (location = json/body). And I use a specific handmade schema (DeliveryQueryArgsSchema) to parse GET arguments (location = querystring).

In this handmade Schema, I add manually the fields for the parameters I want to use. It smells like duplication, as if I want to allow filtering by any field, I need to add all the fields manually, but in practice I came to realize that I never wanted that. For instance, sorting by string fields such as name, address or so was not really realistic. IDs are meant to be used as filters. Names could be used by complex pattern searching features for partial string matching, but as plain filters, they kinda suck as the user can’t be expected to enter the real name with no typo, the proper case and accents, etc.), otherwise there’d be no need for an ID (this is totally peremptory and arguable, MongoDB typically uses plain strings for ID in their tutorials, but you get the point). Floats are a better example. No point filtering by float since float equality is almost impossible technically and makes no sense in real life. You’d rather use a gt/lt interval

In a project, we also crafted a sort field that deserializes inputs such as name,-age into a query sorting by name and reversed age.

There could be ways to avoid doing everything manually. You could create a function that would generate those DeliveryQueryArgsSchema the clever way, given DeliverySchema .

  • It could use some fields as filters, maybe only some types of fields, or maybe only those to which you would have passed use_as_filter=True in the Schema declaration.

  • It could add _lt and _gt and such to all number and datetime fields. Note there may be a lot of them, with lte, gte, etc.

  • And so on.

Also, note @frol’s advice above:

While suffixes are fine from URL point of view, they would look bloated in Swagger config since you will have every parameter expanded into 6 or 8 parameters (depends on the parameter type: number or string). This is why I probably won’t implement querying this way.

If you need complex querying (I mean non-trivial, I totally agree that what you’re talking about it not complex API and sounds rather like everyone’s needs), maybe you’re better-off using a String field in webargs, document it as “my custom query string” in the docs (assuming you’re documenting your API with a tool like marshmallow’s apispec) and do the query parsing in a custom function.

Beware of the quest for the silver bullet. There may be no point having a frameworks that creates an API allowing the user to do everything on the data. Sometimes, there are things you don’t want him to do, so you need to be able to add exceptions, like removing part of the automatically generated fields. And some other things are harmless but useless. Maybe you only need 100% of it.

YMMV. In our case, we realized that we didn’t need that many options in the query, so we removed the PQL code (see above) from our application and just added the fields manually for the filters we wanted to provide the user. Overall, there were not that many fields to add.

Hope this helps…

0reactions
lafrechcommented, Jul 5, 2019

Well, I like to have new_item in a single variable. If I used use_kwargs, the view func would look like this, I suppose:

@blp.route('/<objectid:item_id>')
class BuildingsById(MethodView):

    [...]

    @blp.arguments(BuildingSchema)
    @blp.response(BuildingSchema)
    def put(self, item_id, **new_item):
        """Update an existing building"""
        item = Building.get_by_id(item_id)
        item.update(**new_item, BuildingSchema)
        item.save()
        return item

And what if the object has an attribute named item_id? Alright, this shouldn’t happen.

The object attributes would be mixed with the ID in the function signature, which doesn’t convey the intent.

Well that’s my understanding. I have no strong opinion about that. And maybe I’m just wrong about use_kwargs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using webargs to parse more complex query arguments ...
I would like my API to offer basic query features, like not only get item by ID but also using operators on item...
Read more >
Advanced Usage - webargs 8.2.0 documentation
If a List field is used to parse data from a location like query parameters – where one or multiple values can be...
Read more >
Understanding REST Parameters | Documentation - SoapUI
Learn about REST Parameters and the different types of REST Parameters available to you in SoapUI: QUERY, HEADER, TEMPLATE, MATRIX and PLAIN.
Read more >
webargs - Read the Docs
5.2.6 Parsing Lists in Query Strings. Use fields.DelimitedList to parse comma-separated lists in query parameters, e.g. /?permissions=read,.
Read more >
Flask, Marshmallow 3, and webargs use_args fails to parse ...
The logic changed in webargs 6. Before webargs 6, the parser would iterate over the fields of the schema and, by default, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found