Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HashSpi allocates single byte arrays on the single byte update path

See original GitHub issue

While benchmarking ACCP on code that is more or less:

MessageDigest digest = MessageDigest.getInstance("MD5");
for (<many iterations>)
{
...
    digest.update(<single byte>);
}

I noticed that ACCP was generating significantly more garbage and was a lot slower than expected. I believe this is because TemplateHashSpi allocates a single byte array on the hot path of single byte updates: https://github.com/corretto/amazon-corretto-crypto-provider/blob/b204b018f6aa5d42b4fee0d0a94a93994bede081/template-src/com/amazon/corretto/crypto/provider/TemplateHashSpi.java#L119-L121

Since the Spi contract is inherently not threadsafe, performance on use cases such as above could be improved signficantly by caching a single byte buffer like the TemplateHmacSpi does: https://github.com/corretto/amazon-corretto-crypto-provider/blob/b204b018f6aa5d42b4fee0d0a94a93994bede081/template-src/com/amazon/corretto/crypto/provider/TemplateHmacSpi.java#L299-L304

An easy reproduction is to run the following with and without ACCP:

import java.security.MessageDigest;

public final class Test
{
    public static void updateWithInt(MessageDigest digest, int val)
    {
        digest.update((byte) ((val >>> 24) & 0xFF));
        digest.update((byte) ((val >>> 16) & 0xFF));
        digest.update((byte) ((val >>>  8) & 0xFF));
        digest.update((byte) ((val >>> 0) & 0xFF));
    }

    public static void main(String[] args) throws Exception
    {
        int numRounds = 100000000;
        if (args.length  > 0) {
            numRounds = Integer.parseInt(args[0]);
        }
        System.out.println("Burn test of MD5");
        MessageDigest digest = MessageDigest.getInstance("MD5");
        System.out.println("Using Digest: " + digest.toString());
        long start = System.currentTimeMillis();
        for (int i = 0; i < numRounds ; i ++) {
            updateWithInt(digest, i);
        }
        long end = System.currentTimeMillis();

        System.out.println("Result:   " + digest.digest());
        System.out.println("Time(ms): " + (end - start));
    }
}

time java -Djava.security.properties=/path/to/amazon-corretto-crypto-provider.security -cp AmazonCorrettoCryptoProvider-1.1.0-linux-x86_64.jar:. Test 100000000
Burn test of MD5
Using Digest: MD5 Message Digest from AmazonCorrettoCryptoProvider, <initialized>

Result:   [B@6166e06f
Time(ms): 10358
java  -cp AmazonCorrettoCryptoProvider-1.1.0-linux-x86_64.jar:. Test 10000000  12.91s user 0.25s system 117% cpu 11.227 total

time java -cp . Test 100000000 
Burn test of MD5
Using Digest: MD5 Message Digest from SUN, <initialized>

Result:   [B@2a139a55
Time(ms): 3878
java -cp . Test 100000000  3.99s user 0.02s system 101% cpu 3.945 total

Also using sjk we can see that the Corretto version is allocating close to 900 MiBps:

$ sjk ttop -o ALLOC -p $(pgrep -f Test)
2019-08-03T23:38:31.019-0700 Process summary 
  process cpu=107.39%
  application cpu=101.04% (user=99.38% sys=1.65%)
  other: cpu=6.35% 
  thread count: 12
  GC time=0.23% (young=0.23%, old=0.00%)
  heap allocation rate 842mb/s
  safe point rate: 1.6 (events/s) avg. safe point pause: 1.65ms
  safe point sync time: 0.01% processing time: 0.25% (wallclock time)
[000001] user=98.21% sys= 1.50% alloc=  842mb/s - main
[000016] user= 1.17% sys= 0.01% alloc=  324kb/s - RMI TCP Connection(1)-127.0.0.1
[000018] user= 0.00% sys= 0.13% alloc=  4461b/s - JMX server connection timeout 18
[000002] user= 0.00% sys= 0.00% alloc=     0b/s - Reference Handler
[000003] user= 0.00% sys= 0.00% alloc=     0b/s - Finalizer
[000004] user= 0.00% sys= 0.00% alloc=     0b/s - Signal Dispatcher
[000011] user= 0.00% sys= 0.00% alloc=     0b/s - ForkJoinPool.commonPool-worker-1
[000012] user= 0.00% sys= 0.00% alloc=     0b/s - ForkJoinPool.commonPool-worker-2
[000013] user= 0.00% sys= 0.01% alloc=     0b/s - Native reference cleanup thread
[000014] user= 0.00% sys= 0.00% alloc=     0b/s - Attach Listener
[000015] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-0
[000017] user= 0.00% sys= 0.00% alloc=     0b/s - RMI Scheduler(0)

vs the JDK version that allocates basically nothing:

2019-08-03T23:39:41.936-0700 Process summary 
  process cpu=104.07%
  application cpu=100.77% (user=100.42% sys=0.35%)
  other: cpu=3.30% 
  thread count: 9
  heap allocation rate 252kb/s
  safe point rate: 0.8 (events/s) avg. safe point pause: 0.12ms
  safe point sync time: 0.00% processing time: 0.01% (wallclock time)
[000013] user= 0.59% sys= 0.23% alloc=  248kb/s - RMI TCP Connection(1)-127.0.0.1
[000015] user= 0.00% sys= 0.04% alloc=  4257b/s - JMX server connection timeout 15
[000001] user=99.83% sys= 0.08% alloc=     0b/s - main
[000002] user= 0.00% sys= 0.00% alloc=     0b/s - Reference Handler
[000003] user= 0.00% sys= 0.00% alloc=     0b/s - Finalizer
[000004] user= 0.00% sys= 0.00% alloc=     0b/s - Signal Dispatcher
[000010] user= 0.00% sys= 0.00% alloc=     0b/s - Attach Listener
[000012] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-0
[000014] user= 0.00% sys= 0.00% alloc=     0b/s - RMI Scheduler(0)

JVM version information:

java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

Issue Analytics

State:
Created 4 years ago
Comments:12 (7 by maintainers)

Top GitHub Comments

5reactions

jolynchcommented, Sep 16, 2019

@SalusaSecondus thank you very much for the quick patch! I just rolled 1.1.1 out in our load testing Cassandra clusters and am already seeing significant improvements. It appears we’re reducing on CPU time of our digesting functions during quorum reads by up to 50% (so we’re going from 20% on CPU time to 10% on CPU time according to flamegraphs). I’ve also been able to enable AES-GCM without any noticeable increase in CPU load, which is an achievement all in itself.

0reactions

SalusaSeconduscommented, Sep 14, 2019

Version 1.1.1 released with this fix.