discussion: housenumber extraction regexen
See original GitHub issuehey all, I came across this issue when investigating https://github.com/openaddresses/openaddresses/issues/2070
it seems like most of the regexen (is that the British form of regexes? 😃) don’t support alphanumeric house numbers (eg. 1a
) or address ranges (eg. 1-10
).
the most common form of regex seems to be ^([0-9]+)
which would probably be better off written as something like ^([0-9]+[a-zA-Z]?|[0-9]+-[0-9]+)
.
as a result the street regi can end up doing unexpected things such as this:
-74.035221,40.7429476,416-18 GRAND ST,416-18 Grand Street,,,,,,,
I can go ahead and fix it for US/NJ
but it seems to also affect a bunch of other files.
to match allthethings we could use the following; which should leave the street name unaffected in the case where a house number match was not possible:
"416-18 GRAND ST".match(/^(([0-9]+[a-zA-Z]?|[0-9]+-[0-9]+)\s+)(.*)$/)
["416-18 GRAND ST", "416-18 ", "416-18", "GRAND ST"]
the matches would then always be $2 for housenumber and $3 for street name
note: the ordering would need to be flipped for Germanic addresses
thoughts?
$ find sources -type f -iname "*.json" | xargs grep \"pattern\" | cut -d\" -f4 | sort | uniq -c | sort -n -r
376 ^(?:[0-9]+ )(.*)
311 ^([0-9]+)
67 ^([0-9]+)( .*)
49 ^.* ((Unit|Apt) [0-9A-Za-z])$
16 ^(?:\\S+ )(.*)
15 ^(\\S+)
8 ([0-9]*) (.*)
6 function
5 ^(?:[0-9]+ )?(.*?),(?: )?(|[^,]+),?(?: )(.*?),(?: )(.*)(?: )([0-9]+)$
5 ^(?:[0-9]+) (.*)
3 ([0-9]+)(.*)~(.*), IL ([0-9]+)
... etc
Issue Analytics
- State:
- Created 7 years ago
- Comments:14 (14 by maintainers)
Top Results From Across the Web
house number - Regex Tester/Debugger
Top Regular Expressions. Url checker with or without http:// or https:// · Match string not containing string · Check if a string only...
Read more >Splitting string with house number in QGIS Field Calculator
Extract information with a regular expression is always a bit tricky without having ... Find all consecutive numbers (first occurrences):
Read more >regex filter street number and zip from address - Stack Overflow
I have some detail fields I need to filter out house number and zip but leave all other numbers in. example 1: Van:...
Read more >Solved: RegEx - Addresses, different formats, and headache...
Use RegEx to identify different address formats - e.g. if an address is a number, followed by one or two letters, followed by...
Read more >Extracting house number from address including characters
I found a solution on the forums where I can use regex to extract any and all numbers from a string which works...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Maybe this is an opportunity for a new function in the core tag set that’s a pre-baked, reliable regex for street numbers?
Thanks @trescube! Thanks for your good work on the new functions!