question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

discussion: housenumber extraction regexen

See original GitHub issue

hey all, I came across this issue when investigating https://github.com/openaddresses/openaddresses/issues/2070

it seems like most of the regexen (is that the British form of regexes? 😃) don’t support alphanumeric house numbers (eg. 1a) or address ranges (eg. 1-10).

the most common form of regex seems to be ^([0-9]+) which would probably be better off written as something like ^([0-9]+[a-zA-Z]?|[0-9]+-[0-9]+).

as a result the street regi can end up doing unexpected things such as this:

-74.035221,40.7429476,416-18 GRAND ST,416-18 Grand Street,,,,,,,

I can go ahead and fix it for US/NJ but it seems to also affect a bunch of other files.

to match allthethings we could use the following; which should leave the street name unaffected in the case where a house number match was not possible:

"416-18 GRAND ST".match(/^(([0-9]+[a-zA-Z]?|[0-9]+-[0-9]+)\s+)(.*)$/)
["416-18 GRAND ST", "416-18 ", "416-18", "GRAND ST"]

the matches would then always be $2 for housenumber and $3 for street name
note: the ordering would need to be flipped for Germanic addresses

thoughts?

$ find sources -type f -iname "*.json" | xargs grep \"pattern\" | cut -d\" -f4 | sort | uniq -c | sort -n -r
    376 ^(?:[0-9]+ )(.*)
    311 ^([0-9]+)
     67 ^([0-9]+)( .*)
     49 ^.* ((Unit|Apt) [0-9A-Za-z])$
     16 ^(?:\\S+ )(.*)
     15 ^(\\S+)
      8 ([0-9]*) (.*)
      6 function
      5 ^(?:[0-9]+ )?(.*?),(?: )?(|[^,]+),?(?: )(.*?),(?: )(.*)(?: )([0-9]+)$
      5 ^(?:[0-9]+) (.*)
      3 ([0-9]+)(.*)~(.*), IL ([0-9]+)
... etc

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
migurskicommented, Oct 20, 2016

Maybe this is an opportunity for a new function in the core tag set that’s a pre-baked, reliable regex for street numbers?

0reactions
migurskicommented, Nov 29, 2016

Thanks @trescube! Thanks for your good work on the new functions!

Read more comments on GitHub >

github_iconTop Results From Across the Web

house number - Regex Tester/Debugger
Top Regular Expressions. Url checker with or without http:// or https:// · Match string not containing string · Check if a string only...
Read more >
Splitting string with house number in QGIS Field Calculator
Extract information with a regular expression is always a bit tricky without having ... Find all consecutive numbers (first occurrences):
Read more >
regex filter street number and zip from address - Stack Overflow
I have some detail fields I need to filter out house number and zip but leave all other numbers in. example 1: Van:...
Read more >
Solved: RegEx - Addresses, different formats, and headache...
Use RegEx to identify different address formats - e.g. if an address is a number, followed by one or two letters, followed by...
Read more >
Extracting house number from address including characters
I found a solution on the forums where I can use regex to extract any and all numbers from a string which works...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found