Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consistent approach for propagating validation errors from GEOS

See original GitHub issue

This came out of discussion from #144

In short, it would be good for pygeos to have a consistent approach to propagating validation errors from GEOS.

In some cases, GEOS returns fairly clear exception messages, e.g., passing an out of range number to densify parameter in GEOSFrechetDistanceDensify_r:

pygeos.GEOSException: IllegalArgumentException: Fraction is not in range (0.0 - 1.0]

In other cases, it might not. My personal observation is that lack of validation of user inputs results in a not-insignificant maintenance burden for open source projects; bug reports uncover these gaps over time rather than anticipating them upfront. That said, it is entirely possible to overdo validation, and we have lots of other things to focus our attention on here.

The latest guidance from @caspervdw is to avoid replicating validation and exceptions from GEOS if they are good enough, to reduce complexity in pygeos.

There are a couple of ways we could approach this:

1. Use GEOS exceptions but ensure these are good enough within our tests. All this would require is that we add enough tests to our suite to check for invalid inputs, and ensure that the error messages are sufficiently clear - and remain consistent over GEOS versions.

If we discover that some invalid inputs result in segfaults or poor exception messages from GEOS, we should handle those on a case-by-case basis, including possibly adding a validation check in pygeos prior to calling GEOS code.

pros:

simplicity
avoid replicating validation logic and associated maintenance burdens

cons:

inconsistent exception types: sometimes invalid inputs may result in pygeos.GEOSException, ValueError, or other exception types we define in pygeos for invalid inputs.
outsources validation to GEOS, which means some risks of segfaults for unexpected inputs

2. Add validation checks and raise appropriate exceptions in pygeos This would require anticipating a wide variety of invalid inputs and testing those specifically.

pros:

consistent validation checks and associated exception types / messages
ensures that pygeos.GEOSException errors are reserved for internal use or indicate exceptions that we should be catching first through validation in pygeos before calling GEOS.

cons:

more effort to validate inputs and create associated tests
replicates validation logic in GEOS

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

jorisvandenbosschecommented, Sep 16, 2020

I think a “developers” / “internals” / “design” document to capture some of those discussions would indeed be useful (eg also in general describing the missing values handling, the GEOS context handling, …). Putting such a page in the docs would be fine I think

0reactions

caspervdwcommented, Sep 15, 2020

I think this discussion came to a consensus. Maybe we could start keeping these design decisions somewhere? @jorisvandenbossche @brendan-ward do you have any experience with that?

Top Results From Across the Web

How to make "One or more validation errors occurred" raise ...

Altho this output is pretty informative and clear it is not consistent with other error outputs that my API produces by means of...

Validation Error less than training error? - Cross Validated

Simply put, if training loss and validation loss are computed correctly, it is impossible for training loss to be higher than validation loss....

Validation practices for satellite soil moisture retrievals

This paper presents a community effort to develop good practice guidelines for the validation of global coarse-scale satellite soil moisture products.

Development and validation of a consistency based multiple ...

Abstract. Summary: We introduce an algorithm that uses the information gained from simultaneous consideration of an entire group of related proteins to ...

Verification and Validation in Computational Fluid Dynamics1

Consistent with the authors' contention that nondeterministic simulations are needed in many validation comparisons, a three-step statistical approach is ...