Reporting field errors from whole schema validator
See original GitHub issueSometimes validating a field at a time is not enough and you want to validate a combination of fields. The fields could be deeply nested inside your structure. In that case you create a validate_schema
validator and do all checks there. At the end you want to report errors not for the whole schema, but for particular fields. Documentation for ValidationError states that it accepts
String, list, or dictionary of error messages. If a dict, the keys will be field names and the values will be lists of messages
We can probably extrapolate last statement that dictionary values could contain not only list of messages, but also dictionary of nested field errors.
But in reality when you raise a ValidationError with a dictionary it is treated as just another error message (which happen to be a dictionary, not a string). Instead this error dictionary should be deeply merged with errors reported by individual fields.
I have an implementation for that BUT there is one test case (tests.test_decorators.TestValidatesSchemaDecorator.test_passing_original_data) which on it’s own tests a different thing, but it showcases passing a dictionary to ValidationError that is not a field-to-error mapping, but rather complex error message data structure:
raise ValidationError({'code': 'invalid_field'})
...
assert '_schema' in errors
assert len(errors['_schema']) == 1
assert errors['_schema'][0] == {'code': 'invalid_field'}
I believe, this usage is wrong (since documentation states that error messages should be strings, see quote above), the test should be altered to use simple string as an error message, library should be extended to support getting errors for multiple fields from whole-schema validator.
Here is a sample test of what I’m trying to achieve:
def test_allow_reporting_field_errors_in_schema_validator(self):
class NestedSchema(Schema):
baz = fields.Int(required=True)
class MySchema(Schema):
foo = fields.Int(required=True)
bar = fields.Nested(NestedSchema, required=True)
bam = fields.Int(required=True)
@validates_schema(skip_on_field_errors=True)
def consistency_validation(self, data):
errors = {}
if data['bar']['baz'] != data['foo']:
errors['bar'] = {'baz': 'Non-matching value'}
if data['bam'] > data['foo']:
errors['bam'] = 'Value should be less than foo'
if errors:
raise ValidationError(errors)
schema = MySchema()
errors = schema.validate({'foo': 2, 'bar': {'baz': 5}, 'bam': 6})
assert 'bar' in errors
assert 'baz' in errors['bar']
assert errors['bar']['baz'] == 'Non-matching value'
assert 'bam' in errors
assert errors['bam'] == 'Value should be less than foo'
What do you think ?
PS I understand that deep merging errors have corner cases and I already have an algorithm which IMO does that in a sane way.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:5
- Comments:9 (7 by maintainers)
Top GitHub Comments
I ran into the same issue I believe. I have fields that depend on each other and when validating a single field with
@validates()
I don’t have access to other field data. That is why I verify the combination of this data using@validates_schema
. In this method I build a dictionary (errors
) with specific field names/error list, expecting that the Unmarshaller builds its own error dict with this input as field-specific error messages instead of schema-level errors.My use case seem to be solved by instantiating the ValidationError that I raise inside the
@validates_schema
-method with the kwargfield_names=errors.keys()
so Unmarshaller will not default tofield_names = [SCHEMA]
in combination with a change I propose:Is it best if I put this in a separate pull request, referencing this issue ? Locally I’ve already run tox with py27 and py34 and both were OK.
Here is documentation for
messages
argument of ValidationError:First of all, this explanation is inconsistent in so many ways. Why would dict values only be lists? What are allowed values inside lists. It reads as “string of error messages”, “list of error messages” or “dict of error messages”. But “string of error messages” does not make sense. So, obviously what author wanted to say is that “string” IS an error message. And so MANY error messages could be come in the form of either lists or dictionaries. I agree that this allows putting dicts inside lists. But if you look at how (deeply)nested error messages work, they always come in form of dicts. Having two ways to report deeply nested errors (either combination of dicts and lists or just dicts all the way down to particular field where you could have either string or list of strings) makes API inconsistent and inconvenient. Consider this:
and
The inner schema is not there because the only reason to have it there is to be able to refer to the whole object (on top level you can not do this), so on inner levels if there is only schema level error you do not need ‘_schema’. If you would have an inner errors for ‘bar’, than you would need to have ‘_schema’:
The top level ‘_schema’ is not needed since you’re only reporting nested errors for top-level object.
So I think the only sane and consistent interpretation of errors structure is this: string error, list of strings (if you have more than one error for particular field) or dict of error message where keys are field names and values are again strings, lists of strings or dicts of error messages.