Guava ImmutableList (and others) offer awful performance in some cases due to size-optmized specializations
See original GitHub issueOriginal issue created by travis.downs on 2013-01-27 at 05:39 AM
Many of the guava Immutable collections have a cute trick where they have specializations for zero (EmptyImmutableList) and one (SingletonImmutableList) element collections. These specializations take the form of subclasses of ImmutableList, to go along with the “Regular” implementation and a few other specializations like ReverseImmutable, SubList, etc.
Unfortunately, the result is that when these subclasses mix at some call site, the call is megamorphic, and performance is awful compared to classes without these specializations (worse by a factor of 20 or more).
Here’s a simple benchmark that shows ImmutableList vs ArrayList - given that ImmutableList is a simple ArrayList, without the need for double bounds checks on the upper bound, you’d expect it to at least be comparable:
benchmark minSize ns linear runtime
ArrayList 0 60.9 =
ArrayList 1 60.6 =
ArrayList 2 60.6 =
ArrayList 3 60.7 =
ImmutableList 0 1169.0 ============================== ImmutableList 1 107.4 == ImmutableList 2 90.7 == ImmutableList 3 90.7 ==
The benchmark just calls size() repeatedly on 100 ArrayLists or ImmutableLists. The sizes of the lists are evenly distributed in [minSize, 4].
You can see that when all lists have at least 2 elements, performance is comparable (~91 ns vs ~61 ns) - with the difference here being attributed to CHA - in the ArrayList case the compiler can prove that ArrayList class is effectively final, and can avoid the inline type check (so even in the >= 2 element case, the specializations hurt).
With 1 element lists present, the call is bimorphic, so it can still be inlined and optimized, but the extra check costs a bit.
With 0 element lists, performance tanks. The call is megamorphic and can’t be well optimized. The cost of the call is ~20 times worse than ArrayList.
The penalty applies every List<> call, not just size().
In the above benchmark, the type of the array of List was ArrayList[] and ImmutableList[] respectively.
If you change that to be List[] in both cases, the performance degrades even more:
benchmark minSize ns linear runtime
ArrayList 0 90.6 =
ArrayList 1 90.8 =
ArrayList 2 90.7 =
ArrayList 3 90.8 =
ImmutableList 0 2061.3 ============================== ImmutableList 1 115.2 = ImmutableList 2 90.6 = ImmutableList 3 90.7 =
CHA isn’t in play now, because the type of the array is List, which has multiple implementations, so ArrayList degrades to ~91 ns, just like ImmutableList.
The worse case ImmutableList performance has been cut in half, however, since now the call is a megamorphic invokeinterface, rather than invokevirtual, ugh.
This kind of scenario is not at all far fetched in real world code - it is natural to have lists of 0 and 1 length (which is probably why these specializations were created in the first place), and it is not unusual to find them mixed at a single call site - often in a hot method that takes a List<> as input. Unlike something like branch prediction, the worst case behavior will occur permanently once the specializations have ever been seen at the call site. Even if 0 or 1 element arrays are very uncommon, once you see one of each, you are hosed until you restart the VM (at least in every JIT that I know of).
The benchmark doesn’t even test the worst case - the pattern of lists sizes is totally regular, so the branches inherent in the bimorphic and megamorphic dispatch will be well predicted. Randomize it and it will get worse (especially the bimorphic case because it is pretty fast so has a lot further to call).
The same issue could occur with SubList and perhaps ReverseImmutableList also, although probably less frequently, and there may be no good alternative there.
A reasonable compromise here would be to keep only the Singleton list implementation, and use a global static instance of that for the 0-element case also, with null element array, and special case all the methods to do the right thing for a null array. This will get you most of the memory and GC benefit of the current implementation (only a slight increase for 0-element lists), while making the worse case bimorphic, which isn’t too bad.
Alternately, you could keep only the 0-element case, and group 1-element lists into the general case. This will increase memory use by 2.5x (16 bytes vs 40 bytes on hotspot bytes according to my back-of-napkin calcs), and probably have somewhat worse runtime performance for heavy use of 1-element lists.
Benchmark attached.
Issue Analytics
- State:
- Created 9 years ago
- Reactions:6
- Comments:20 (1 by maintainers)
Top GitHub Comments
@thespags - the wayback machine has it.
I’ve attached it here (because I can’t attach it to the first message in this thread):
ArrayListVsImmutableList.zip
Is the benchmark that was run still available with the move to github?