Validate on export?
See original GitHub issueQuestion
First, thanks for this awesome library. It’s great!
TL;DR - Would you consider accepting a PR to optionally validate data when calling dict()
or json()
?
More context:
One of my main concerns about pydantic is the fact that we might be exporting data invalid data if you are not careful when creating the instances.
Our workflow often follows this pattern:
- Parse / create model instances
- Modify those instances
- Export data
Ideally we would love to validate before exporting to make sure it exports valid data. Let’s say out code contains a bug that introduces an invalid element, I would be like to detect it at some point, at least before we export it with dict()
.
I have been able to implement this in several ways, and I have a solution that is quite good in terms of performance. I would be willing to open a PR if you think this is an interesting feature.
BTW I’m not sure whether this is already implemented and I just missed the feature - but I doubt it because I don’t see any logic for this within dict() / _iter() / _get_value
methods.
I found that we could use validate_assignment
, however this is not “perfect” because if you change a mutable element it’s not going to be validated (e.g. to an existing list of elements you could add a new element that is not compliant with the schema). Also, the validate_assignment
performance is far from good for our use cases.
I realize that having mypy on a project might make this somewhat unnecessary (regarding type safety) but we would miss full model validations anyway.
- Pydantic version
import pydantic; print(pydantic.VERSION)
: 1.2
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (2 by maintainers)
@dmontagu regarding your comment:
I didn’t consider this initially. I did a test and it’s really bad for the data if the modifications are not idempotent, e.g.:
Having said that I guess it is a bad design/practice for a validator to modify the original value instead of doing a copy. Or at least instead of doing idempotent transformations. So for me this wouldn’t be a stopper for the feature.
Regarding the fact that
Config
options are a burden to maintain I agree and that’s why I used thegetattr
, to reduce that burden.I also agree the performance penalty is something that might be of concern if your priority is top performance. I benchmarked with this code 1M objects:
The results:
For my current use case I don’t worry that much about the performance of
dict()
because our bottleneck is in a different place, and we are in a lower order of magnitude of objects (so ~30 ms is negligible in our case). However I understand it’s not a good thing for pydantic at the moment, unless the community shows an interest on this feature more broadly.Also, as @samuelcolvin has stated on many occasions, pydantic was designed around the idea of eager validation, rather than lazy validation. Given performance is such a priority for pydantic I think it makes sense to prioritize the eager validation and have the user take responsibility for ensuring models remain valid. It seems to me that for most use cases, this could be checked by mypy, or at least the model could be refactored so it could be checked by mypy.
If you have a case you think differs on this point, it would help the discussion if you could share some code / detail.