question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Validate on export?

See original GitHub issue

Question

First, thanks for this awesome library. It’s great!

TL;DR - Would you consider accepting a PR to optionally validate data when calling dict() or json()?

More context:

One of my main concerns about pydantic is the fact that we might be exporting data invalid data if you are not careful when creating the instances.

Our workflow often follows this pattern:

  • Parse / create model instances
  • Modify those instances
  • Export data

Ideally we would love to validate before exporting to make sure it exports valid data. Let’s say out code contains a bug that introduces an invalid element, I would be like to detect it at some point, at least before we export it with dict().

I have been able to implement this in several ways, and I have a solution that is quite good in terms of performance. I would be willing to open a PR if you think this is an interesting feature.

BTW I’m not sure whether this is already implemented and I just missed the feature - but I doubt it because I don’t see any logic for this within dict() / _iter() / _get_value methods.

I found that we could use validate_assignment, however this is not “perfect” because if you change a mutable element it’s not going to be validated (e.g. to an existing list of elements you could add a new element that is not compliant with the schema). Also, the validate_assignment performance is far from good for our use cases.

I realize that having mypy on a project might make this somewhat unnecessary (regarding type safety) but we would miss full model validations anyway.

  • Pydantic version import pydantic; print(pydantic.VERSION): 1.2

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
javiertejerocommented, Dec 6, 2019

@dmontagu regarding your comment:

One thing to consider: validators can perform modifications to input data that are not idempotent. What would you want to happen if the results were re-validated with a validator that modifies the value?

I didn’t consider this initially. I did a test and it’s really bad for the data if the modifications are not idempotent, e.g.:

def test_validator_changing_immutable_values():
    class ValidatorModifyValues(MyCustomBaseModel):
        foo: str
        bar: list

        @validator('*')
        def should_be_shorter_than_3(cls, v):
            return v[:3]

        @validator('bar')
        def pop_a_value(cls, v):
            if len(v):
                v.pop()  # modifies the mutable input list
            return v

    data = {'foo': 'abcdedf', 'bar': [1, 2, 3, 4, 5]}
    vmv = ValidatorModifyValues(**data)
    assert data == {'foo': 'abcdedf', 'bar': [1, 2, 3, 4]}
    assert vmv.dict() == {'foo': 'abc', 'bar': [1, 2]}
    assert vmv.dict() == {'foo': 'abc', 'bar': [1]}
    assert vmv.dict() == {'foo': 'abc', 'bar': []}
    assert vmv.dict() == {'foo': 'abc', 'bar': []}

Having said that I guess it is a bad design/practice for a validator to modify the original value instead of doing a copy. Or at least instead of doing idempotent transformations. So for me this wouldn’t be a stopper for the feature.

Regarding the fact that Config options are a burden to maintain I agree and that’s why I used the getattr, to reduce that burden.

I also agree the performance penalty is something that might be of concern if your priority is top performance. I benchmarked with this code 1M objects:

def test_benchmark_base_model():
    BaseModel.Config.validate_on_export = False

    class Foo(BaseModel):
        i: int

    class Bar(BaseModel):
        foos: List[Foo]

    size = 1000000
    bar = Bar(foos=[Foo(i=x) for x in range(size)])

    start = time.perf_counter()
    bar.dict()
    dict_time = int((time.perf_counter() - start) * 1000)
    print(f"dict() time: {dict_time} ms")

The results:

For my current use case I don’t worry that much about the performance of dict() because our bottleneck is in a different place, and we are in a lower order of magnitude of objects (so ~30 ms is negligible in our case). However I understand it’s not a good thing for pydantic at the moment, unless the community shows an interest on this feature more broadly.

1reaction
dmontagucommented, Dec 4, 2019

Also, as @samuelcolvin has stated on many occasions, pydantic was designed around the idea of eager validation, rather than lazy validation. Given performance is such a priority for pydantic I think it makes sense to prioritize the eager validation and have the user take responsibility for ensuring models remain valid. It seems to me that for most use cases, this could be checked by mypy, or at least the model could be refactored so it could be checked by mypy.

If you have a case you think differs on this point, it would help the discussion if you could share some code / detail.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Validate A DataPump Export (EXPDP) Dump File ?
Information in this document applies to any platform. Goal. How to confirm that expdp dump files are valid and can be successfully imported...
Read more >
Import/Export validation rules - Caristix
Select the task containing validation rules to export · Select Validation tab · Select Segment/Field Validation tab · Right-click in the table content...
Read more >
Validate on export? · Issue #1076 · pydantic/pydantic - GitHub
Let's say out code contains a bug that introduces an invalid element, I would be like to detect it at some point, at...
Read more >
Validation Export Utility - LexisNexis ® Support
The Validation Export Utility (Juris Validation Export.exe); located in the Juris ® bin directory, is used to create a Juris Validation file that...
Read more >
Validate an Export File - IBM
Complete this procedure to validate an export file and write warnings that will occur at conversion to the upgrade log. This procedure does...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found