get rid of String.intern
See original GitHub issueInternCache is used in jackson for json property names only and solves the problem of extra memory footprint.
In the same time it uses String.intern
- that is not an issue rather than misusage of it. The purpose of using string.intern is jvm specific: usually it maintains string literal pool (and covers internal jvm cases).
There are several known drawbacks of using intern:
- String pool Hashtable is non-resizable table. That means: we suffer of hash code collisions when table will have items more than its size (suffer as much as many items in string.intern pool)
- The other drawback is involve GC into that: string intern pool is subject for minor collection too. So, app suffers of using string.intern implicitly via GC phases.
There is no use cases of check the equality of strings using ==
and therefore no any reasons of using String.intern
- the biggest profit is to have string deduplication (already achieved).
patch
===================================================================
--- src/main/java/com/fasterxml/jackson/core/util/InternCache.java (revision 489becbfb28a41980f0d5147d6069b30fa3b5864)
+++ src/main/java/com/fasterxml/jackson/core/util/InternCache.java (revision )
@@ -58,7 +58,7 @@
}
}
}
- result = input.intern();
+ result = input;
put(result, result);
return result;
}
Test case
The general idea behind syntenic json is used in test (a flat json structure with 10000 properties with prefix someQName) is provide lots of different property names to trigger usage of string.intern
within InternCache
- that is close to real apps use cases where amount of unique property names is on a scale from hundreds to thousands.
{
"someQName0": 0,
"someQName1": 1,
....
"someQName9999": 9999
}
JMH test
Let’s measure performance of a single json parse with interting and w/o it: PerfInternCache.java
Benchmark Mode Cnt Score Error Units
-PerfInternCache.intern avgt 5 4098.696 ± 164.484 us/op
+PerfInternCache.noIntern avgt 5 2320.159 ± 204.301 us/op
GC
Another test is to measure how intern is affecting implicitly via GC: handle 10k the same jsons as we use in previous test (the use case is very very close to real one in real apps like web service / microsevices): InternCache GC timings java test
Run it with -verbose:gc -XX:+PrintGCDetails -Xloggc:gc.log
and after that get the total GC pause time
$ grep "GC" gc.log | grep "Times: " | awk '{S+=$8}END{print S}'
-intern 0.1907254 +- 0.00469 sec
+w/o intern 0.07665 +- 0.00498 sec
Conclusion
Using intern harms application performance as explicitly via more expencive InternCache.intern
and implicitly via GC. In the same time we keep memory footprint on the same low possible level.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:14
- Comments:38 (15 by maintainers)
Top GitHub Comments
There is
JsonFactory.Feature.INTERN_FIELD_NAMES
to use if you prefer no intern()ing.]@cowtowncoder I know about that flag - the problem that it is enabled by default and that kind of optimisation harms app more neither having it switched off.
What’s the point to have such kind of degradation flag rather to fix it and make apps (bearing in mind the spread around of json) using jackon feel a bit better ?