Spain’s cadastral data seems to be open by decree. Unfortunately, in its easiest-to-use forms it is gated by client-side certs that are only available to Spanish citizens who are using eServices.
Luckily, as part of the INSPIRE process, Spain has released an ungated version here. Less luckily, it consists of 7,000+ GML files, where the street name is made available through an XLINK record that ogr2ogr
does not seem to resolve correctly. For now I am working under the assumption that this data is licensed under the same terms as the non-INSPIRE dataset. Some custom scripting will be necessary to resolve the street names and make this data processable by OA.
NOTE: OSM talk-es
users have pointed me to some concerns about the quality of the cadastral dataset – specifically, house numbers getting paired with the wrong street names. I’m not yet sure how big of a deal this is. Any Spanish speakers who can give the following discussion a closer read, by all means do so (I’m working off of Google Translate).
http://gis.19327.n5.nabble.com/Catastro-duda-nombres-de-calles-td5777261.html
Issue Analytics
- State:
- Created 9 years ago
- Comments:15 (11 by maintainers)
Top GitHub Comments
@EamonKeane yes, lieu should be able to handle that fine on the next build once I’ve made the change to match without thoroughfare types like “CL”. We have fairly comprehensive dictionaries for Spanish in terms of the types of phrases that can be removed from the OA street names.
The only requirements are a lat/lon, street and house number, ignores place names. As far as precedence, lieu treats the inputs as one big concatenated list of GeoJSON features, and prefers whichever record came first as the canonical (everything else is a dupe of that record). So if the CartoCity file is passed in first in the arguments list, the CartoCity lat/lons would be preferred over OA, while still allowing for records that were in OA and not in CartoCity to be included.
All of the other steps could be done as pre/post-processing.
@sbma44 the raw data is here (5 GB compressed, contains multiple datasets such as roads and building outlines for each province in addition to the addresses - the ‘PORTAL_PK.shp’ layer): https://www.dropbox.com/s/vcqkhl4x22kfs7y/Spain_CartoCity.zip?dl=0
The algorithm to get the data is:
A rough script for doing so is here: https://gist.github.com/EamonKeane/16ce83ad702125224a723e677a363b09
A postgis dump is here (3.3GB - schema could be polished up): https://www.dropbox.com/s/nmfdq2z8p33t48z/carto_ciudad.sql?dl=0
Deduping I think will be tricky. There are no place names, only house numbers and streets. The existing dataset includes street modifiers like ‘CL COSTANA’ as @thatdatabaseguy mentions, while the new one does not have them, just e.g. ‘COSTANA’.
An algorithm for deduping might be: