Automatic geocoding should always be done with an account's specified provider
See original GitHub issueContext
Please explain here below what you were doing when the issue happened
We tested this with two separate CARTO accounts configured in Superadmin to use Here.com geocoding today and symptoms were identical.
We had a spreadsheet (can share if needed) that among other columns has an address
and city
column of store locations in the USA. We uploaded this by dragging and dropping onto the browser window (“Maps” page). The file was uploaded, automatically geocoded, and the Builder map created. We grew skeptical when we saw points in New York and Alabama that should have been in Tennessee and I investigated further.
I then took the same file, un-checked the “Let CARTO automatically guess data types and content on import.” check-box, and uploaded it. This time, presumably because the auto-guess box was unchecked, the file was imported and not geocoded, the_geom
column as null
for all records. I then created a map with this 2nd file in Builder, applied the Georeference analysis with the appropriate columns, and spot-checked the results. All locations that were supposed to be in Tennessee were, in fact, this time, in Tennessee as expected.
Working conclusions:
Even if your account is configured to use HERE geocoding as the provider, if you upload a spreadsheet file and the “Let CARTO automatically guess data types and content on import.” check box is clicked when you do, and your file has some of our auto-recognized column names like address
, your file will be automatically geocoded using Mapzen and not Here, and will therefore most likely be poor quality. The work-around is to upload your file, de-select that “automatically guess” checkbox, and apply a Georeference analysis in Builder. That is the only way you can be sure your data will be geocoded with Here.
Steps to Reproduce
Please break down here below all the needed steps to reproduce the issue
- Using an account with Here configured as geocoding provider, upload spreadsheet with
address
column and “auto-guess” checkbox clicked - Examine results and look for a high rate of incorrect geocodes (easy to do this with a widget if you have another “state” column) that point to a likelihood this file was auto-geocoded with Mapzen and not Here.
Current Result
Please describe here below the current result you got
I can’t be certain (as we don’t provide metadata yet per @kevin-reilly 's #12371 ), but I’m confident beyond a reasonable doubt that this file is being auto-geocoded with Mapzen, even though the superadmin setting is “heremaps”:
Expected result
Please describe here below what should be the expected behaviour
I would think/hope any account configured to use Here for geocoding would use Here, in all contexts. One possible exception to this might be our geocoding “search box” that appears on maps, which I know is 100% Mapzen across the board for all accounts, but that there is perhaps another business decision we should reconsider too.
Issue Analytics
- State:
- Created 6 years ago
- Comments:10 (6 by maintainers)
Top GitHub Comments
I have some other proposals:
Some remarks:
Content guessing limits: say you have the perfect geocoding provider that is 100% accurate and has 100% coverage. Taking a sample from the whole dataset and with the best statistical method (insert ML or neural networks or whatever you want here) there will always be uncertainty in the results, that is, the sample can always mislead the decision about the column contents of the whole dataset.
Metadata: for a given query instead of just returning a geometry, we’d need to return several other things. That means changes in the API but also changes in the UX. E.g: is the metadata to be added to the columns of the analysis? do users want to have results below X accuracy or Y accuracy?
Internal geocoder: we all know it can be improved. Are we willing to prioritize work on that? It is the provider of the content guessing. It has its limitations but it let us do some stuff without incurring in some costs.
@saleiva and @kevin-reilly I beg you, please: if you really want the situation to improve we’d need a proper feature doc with high level requirements broken down into smaller requirements with a guarantee of consistency and completeness. And then prioritize the feature.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.