Consistent approach for propagating validation errors from GEOS
See original GitHub issueThis came out of discussion from #144
In short, it would be good for pygeos to have a consistent approach to propagating validation errors from GEOS.
In some cases, GEOS returns fairly clear exception messages, e.g., passing an out of range number to densify
parameter in GEOSFrechetDistanceDensify_r
:
pygeos.GEOSException: IllegalArgumentException: Fraction is not in range (0.0 - 1.0]
In other cases, it might not. My personal observation is that lack of validation of user inputs results in a not-insignificant maintenance burden for open source projects; bug reports uncover these gaps over time rather than anticipating them upfront. That said, it is entirely possible to overdo validation, and we have lots of other things to focus our attention on here.
The latest guidance from @caspervdw is to avoid replicating validation and exceptions from GEOS if they are good enough, to reduce complexity in pygeos.
There are a couple of ways we could approach this:
1. Use GEOS exceptions but ensure these are good enough within our tests. All this would require is that we add enough tests to our suite to check for invalid inputs, and ensure that the error messages are sufficiently clear - and remain consistent over GEOS versions.
If we discover that some invalid inputs result in segfaults or poor exception messages from GEOS, we should handle those on a case-by-case basis, including possibly adding a validation check in pygeos prior to calling GEOS code.
pros:
- simplicity
- avoid replicating validation logic and associated maintenance burdens
cons:
- inconsistent exception types: sometimes invalid inputs may result in
pygeos.GEOSException
,ValueError
, or other exception types we define in pygeos for invalid inputs. - outsources validation to GEOS, which means some risks of segfaults for unexpected inputs
2. Add validation checks and raise appropriate exceptions in pygeos This would require anticipating a wide variety of invalid inputs and testing those specifically.
pros:
- consistent validation checks and associated exception types / messages
- ensures that
pygeos.GEOSException
errors are reserved for internal use or indicate exceptions that we should be catching first through validation in pygeos before calling GEOS.
cons:
- more effort to validate inputs and create associated tests
- replicates validation logic in GEOS
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:8 (8 by maintainers)
Top GitHub Comments
I think a “developers” / “internals” / “design” document to capture some of those discussions would indeed be useful (eg also in general describing the missing values handling, the GEOS context handling, …). Putting such a page in the docs would be fine I think
I think this discussion came to a consensus. Maybe we could start keeping these design decisions somewhere? @jorisvandenbossche @brendan-ward do you have any experience with that?