QA points & polygons
See original GitHub issueWe need to help the data team
doing QA of the datasets and also, ensure enrichment works well enough.
So, the objectives of this issue are to create a notebook to allow us:
- validate DO dataset (dataset, geography, variables, …)
- validate enrichment in “worst scenario” case
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
The Point Pixels & Polygons | QA Engineer and Game ...
The Point Pixels & Polygons is a Udemy instructor with educational courses available for enrollment. Check out the latest ... QA Engineer and...
Read more >Point in Polygon Strategies - Eric Haines
Point in polygon algorithms benefit from having a bounding box around polygons with many edges. The point is first tested against this box...
Read more >Data QA: Identifying Small Polygon Features - FME Community
Testing for small polygons is a good QA test because polygons below a certain size are usually indicative of problems such as overlaps,...
Read more >Properties of Polygons | SkillsYouNeed
This page examines the properties of two-dimensional or 'plane' polygons. A polygon is any shape made up of straight lines that can be...
Read more >Q&A - How to count points inside polygons from another ... - TatukGIS
I need to perform a density count of points in Layer A inside each polygon in Layer B. Resulting layer should have polygons...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, just left a big comment here about it: https://github.com/CartoDB/data-observatory/issues/442#issuecomment-570647587
So far, the script runs like so:
And prints results to a file. It gives each dataset a pass/fail, with messages for each of the tests on why it failed if it fails.
After a talk with @cmongut and @xavipereztr I’d like to clarify what’s going to be the role of each team in QA.
Because of ☝️ I’m assigning here @andy-esch. The idea is that data team will provide the scripts inside of a Python Notebook and after that backend team will provide the end2end tests with these scripts. Notebooks’ scripts will be temporal, data team must use the end2end test to validate each dataset in DO.
Since we’ve already worked in some python scripts, @oleurud please share it with @andy-esch he could reuse some of your work.
At the end of this task, we need to provide a python Notebook with the following features:
Validate the metadata (Catalog)
We need to create a test function to validate the metadata of a dataset or geography and their children entities (variables).
We must check if a dataset or geography has the minimum fields full-filled. Variables have been well defined (descriptions are not null, aggregators are there, etc…)
This function tests must use CARTOFrames Catalog.
@oleurud has code around this you must work on extending.
Geographies validation
A Geography needs to be validated, we need to check the geometries are ok.
In the past, we’ve experienced some performance issues because of geodesic issues when the data was upload to BQ. @arredond has more info about this.
Enrichment
Enrichment is today our main operation and we don’t have an automatic workflow to do it. Let’s try to do these scripts client-side using GeoPandas, if we don’t have a PostGIS dependency for these tests our life will be easier 😄.
Enrichment by points To validate an enrichment by points we could generate a dataset of N points inside of the geography we want to do the enrichment against.
generate_points(geography_id, n_points)
-> it returns a geopandas dataframe with a distribution of points inside of the geography. Points should be across all the geography. We need to avoid having points only in a small area.Using this function we need to generate a test function
test_enrichment_point(dataset_id, n_points)
that takes the dataset, call to generate_points, and it’ll run an enrichment using all the variables defined at the dataset.By the moment, we’ll set the value of
n_points
by hand, in the future we’ll try to automatize it through a quick analysis of the geography.Enrichment by polygons
Similar to points, but using polygons.
To validate an enrichment by polygon we could generate a dataset of N polygons inside of the geography we want to do the enrichment against.
generate_polygons(geography_id, n_polygons)
-> it returns a GeoPandas DataFrame with a distribution of polygons inside of the geography. I think we can use generate_points and after that create a Voronoi with these points.Using this function we need to generate a test function
test_enrichment_polygon(dataset_id, n_points)
that takes the dataset and it’ll run an enrichment using all the variables defined at the dataset. Aggregations’ functions should be fetched from metadata.By the moment, we’ll set the value of
n_polygons
by hand, in the future we’ll try to automatize it through a quick analysis of the geography.