Performance issues
See original GitHub issueI’ve been using IKVM to “cross-compile” the libphonenumber jars to a .Net DLL for a long time. Unfortunately, IKVM is no longer maintained (for quite some while now). Today I replaced the cross-compiled DLL with this library, which was a breeze since the public API is, besides .Net conventions like getInstance()
-> GetInstance()
and a single incompatibility with TruncateTooLongNumber
requiring a builder, very compatible. All was well, all my unittests passed. 🎉🥳 So thank you, and kudo’s for the great work on making this such an easy transition.
However, then I started benchmarking and, it turns out, this lib is slower by a factor 30. When handling a lot of phonenumbers that is quite a lot. I have browsed the issues and found some and I’m quite sure and convinced the culprit is the regex(es) used in this lib. I have tested it on .Net 5.0 in a windows console application.
I wish I could dive in deeper and profile the issue(s) and put in some work to help alleviate this but I’m not in a position right now - too busy. However, should I have some spare time; which (of the many) branch(es) should I base my work on? What would be my best bet? I can’t make any promises but I would love to help improve this library.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
A very quick’n’dirty experiment resulted in 1401 unique regexes in
PhoneNumberMetadata.xml
by my count; which resulted in around 10MB of memory usage if I instantiate each of them asnew Regex(<regex>, RegexOptions.Compiled)
. As eachPhoneRegex
has a possible of 3 variants of each regex let’s assume 3 x 10 = 30. Things get complicated though since I can’t imagine a situation where you have all three variants of the entire 1401 regexes in use. But, if you do, worst case is around 30MB of memory.Being born in the 70’s, having grown up with a 16KB Acorn and later C64 with it’s 64KB of memory I wholeheartedly agree that 30MB is a lot. Then again, that is:
Having said that, more limited devices like IoT MCU’s etc. are a different story - but how many of them run .Net, how many of those use this library and how many of those run into the worst case scenario?
So I’m on the fence; on one hand I think “30MB, worst case, who cares?” and on the other hand I think being a little frugal with memory is the right thing to do. So… let’s discuss?
Edit 1: Most notably the
TestFixedLine
goes from 3.2 seconds to 8 milliseconds.~Edit 2: you can see my benchmark project, result and (preliminary) conclusion here. For now, it’s time to go enjoy my weekend. I hope to continue this work soon.~ I clearly needed my weekend, the benchmarks made no sense and had too many hidden variables and unknowns so I deleted the project for now. The unittest improvements (10x faster) still stands though.
Either way: I haven’t even gotten to the part where we could (potentially) profit from the improved Regex performance in .Net 7 which is why I picked this issue back up in the first place today.
8.13.1 seems to be ready 😇 Edit: And released.