Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reporting field errors from whole schema validator

See original GitHub issue

Sometimes validating a field at a time is not enough and you want to validate a combination of fields. The fields could be deeply nested inside your structure. In that case you create a validate_schema validator and do all checks there. At the end you want to report errors not for the whole schema, but for particular fields. Documentation for ValidationError states that it accepts

String, list, or dictionary of error messages. If a dict, the keys will be field names and the values will be lists of messages

We can probably extrapolate last statement that dictionary values could contain not only list of messages, but also dictionary of nested field errors.

But in reality when you raise a ValidationError with a dictionary it is treated as just another error message (which happen to be a dictionary, not a string). Instead this error dictionary should be deeply merged with errors reported by individual fields.

I have an implementation for that BUT there is one test case (tests.test_decorators.TestValidatesSchemaDecorator.test_passing_original_data) which on it’s own tests a different thing, but it showcases passing a dictionary to ValidationError that is not a field-to-error mapping, but rather complex error message data structure:

    raise ValidationError({'code': 'invalid_field'})
    ...
    assert '_schema' in errors                                                      
    assert len(errors['_schema']) == 1                                              
    assert errors['_schema'][0] == {'code': 'invalid_field'}

I believe, this usage is wrong (since documentation states that error messages should be strings, see quote above), the test should be altered to use simple string as an error message, library should be extended to support getting errors for multiple fields from whole-schema validator.

Here is a sample test of what I’m trying to achieve:

def test_allow_reporting_field_errors_in_schema_validator(self):
    class NestedSchema(Schema):
        baz = fields.Int(required=True)

    class MySchema(Schema):
        foo = fields.Int(required=True)
        bar = fields.Nested(NestedSchema, required=True)
        bam = fields.Int(required=True)

        @validates_schema(skip_on_field_errors=True)
        def consistency_validation(self, data):
            errors = {}
            if data['bar']['baz'] != data['foo']:
                errors['bar'] = {'baz': 'Non-matching value'}

            if data['bam'] > data['foo']:
                errors['bam'] = 'Value should be less than foo'

            if errors:
                raise ValidationError(errors)

    schema = MySchema()
    errors = schema.validate({'foo': 2, 'bar': {'baz': 5}, 'bam': 6})
    assert 'bar' in errors
    assert 'baz' in errors['bar']
    assert errors['bar']['baz'] == 'Non-matching value'
    assert 'bam' in errors
    assert errors['bam'] == 'Value should be less than foo'

What do you think ?

PS I understand that deep merging errors have corner cases and I already have an algorithm which IMO does that in a sane way.

Issue Analytics

State:
Created 7 years ago
Reactions:5
Comments:9 (7 by maintainers)

Top GitHub Comments

1reaction

cpoppemacommented, Jan 18, 2017

I ran into the same issue I believe. I have fields that depend on each other and when validating a single field with @validates() I don’t have access to other field data. That is why I verify the combination of this data using @validates_schema. In this method I build a dictionary (errors) with specific field names/error list, expecting that the Unmarshaller builds its own error dict with this input as field-specific error messages instead of schema-level errors.

My use case seem to be solved by instantiating the ValidationError that I raise inside the @validates_schema-method with the kwarg field_names=errors.keys() so Unmarshaller will not default to field_names = [SCHEMA] in combination with a change I propose:

--- a/marshmallow/marshalling.py
+++ b/marshmallow/marshalling.py
@@ -206,7 +206,10 @@ class Unmarshaller(ErrorStore):
                     else:
                         errors.setdefault(field_name, []).extend(err.messages)
                 elif isinstance(err.messages, dict):
-                    errors.setdefault(field_name, []).append(err.messages)
+                    if field_name in err.messages:
+                        errors.setdefault(field_name, []).extend(err.messages.get(field_name))
+                    else:
+                        errors.setdefault(field_name, []).append(err.messages)
                 else:
                     errors.setdefault(field_name, []).append(text_type(err))

Is it best if I put this in a separate pull request, referencing this issue ? Locally I’ve already run tox with py27 and py34 and both were OK.

1reaction

maximkulkincommented, Jul 2, 2016

Here is documentation for messages argument of ValidationError:

messages = None
String, list, or dictionary of error messages. If a dict, the keys will be field names and the values will be lists of messages.

First of all, this explanation is inconsistent in so many ways. Why would dict values only be lists? What are allowed values inside lists. It reads as “string of error messages”, “list of error messages” or “dict of error messages”. But “string of error messages” does not make sense. So, obviously what author wanted to say is that “string” IS an error message. And so MANY error messages could be come in the form of either lists or dictionaries. I agree that this allows putting dicts inside lists. But if you look at how (deeply)nested error messages work, they always come in form of dicts. Having two ways to report deeply nested errors (either combination of dicts and lists or just dicts all the way down to particular field where you could have either string or list of strings) makes API inconsistent and inconvenient. Consider this:

{'_schema': [{'foo': {'bar': ['_schema': 'error 1']}}]}

and

{'foo': {'bar': 'error 1'}}

The inner schema is not there because the only reason to have it there is to be able to refer to the whole object (on top level you can not do this), so on inner levels if there is only schema level error you do not need ‘_schema’. If you would have an inner errors for ‘bar’, than you would need to have ‘_schema’:

{'foo': {'bar': {'_schema': 'error 1', 'baz': 'error 2'}}}

The top level ‘_schema’ is not needed since you’re only reporting nested errors for top-level object.

So I think the only sane and consistent interpretation of errors structure is this: string error, list of strings (if you have more than one error for particular field) or dict of error message where keys are field names and values are again strings, lists of strings or dicts of error messages.