question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HashSpi allocates single byte arrays on the single byte update path

See original GitHub issue

While benchmarking ACCP on code that is more or less:

MessageDigest digest = MessageDigest.getInstance("MD5");
for (<many iterations>)
{
...
    digest.update(<single byte>);
}

I noticed that ACCP was generating significantly more garbage and was a lot slower than expected. I believe this is because TemplateHashSpi allocates a single byte array on the hot path of single byte updates: https://github.com/corretto/amazon-corretto-crypto-provider/blob/b204b018f6aa5d42b4fee0d0a94a93994bede081/template-src/com/amazon/corretto/crypto/provider/TemplateHashSpi.java#L119-L121

Since the Spi contract is inherently not threadsafe, performance on use cases such as above could be improved signficantly by caching a single byte buffer like the TemplateHmacSpi does: https://github.com/corretto/amazon-corretto-crypto-provider/blob/b204b018f6aa5d42b4fee0d0a94a93994bede081/template-src/com/amazon/corretto/crypto/provider/TemplateHmacSpi.java#L299-L304

An easy reproduction is to run the following with and without ACCP:

import java.security.MessageDigest;

public final class Test
{
    public static void updateWithInt(MessageDigest digest, int val)
    {
        digest.update((byte) ((val >>> 24) & 0xFF));
        digest.update((byte) ((val >>> 16) & 0xFF));
        digest.update((byte) ((val >>>  8) & 0xFF));
        digest.update((byte) ((val >>> 0) & 0xFF));
    }

    public static void main(String[] args) throws Exception
    {
        int numRounds = 100000000;
        if (args.length  > 0) {
            numRounds = Integer.parseInt(args[0]);
        }
        System.out.println("Burn test of MD5");
        MessageDigest digest = MessageDigest.getInstance("MD5");
        System.out.println("Using Digest: " + digest.toString());
        long start = System.currentTimeMillis();
        for (int i = 0; i < numRounds ; i ++) {
            updateWithInt(digest, i);
        }
        long end = System.currentTimeMillis();

        System.out.println("Result:   " + digest.digest());
        System.out.println("Time(ms): " + (end - start));
    }
}
time java -Djava.security.properties=/path/to/amazon-corretto-crypto-provider.security -cp AmazonCorrettoCryptoProvider-1.1.0-linux-x86_64.jar:. Test 100000000
Burn test of MD5
Using Digest: MD5 Message Digest from AmazonCorrettoCryptoProvider, <initialized>

Result:   [B@6166e06f
Time(ms): 10358
java  -cp AmazonCorrettoCryptoProvider-1.1.0-linux-x86_64.jar:. Test 10000000  12.91s user 0.25s system 117% cpu 11.227 total

vs

time java -cp . Test 100000000 
Burn test of MD5
Using Digest: MD5 Message Digest from SUN, <initialized>

Result:   [B@2a139a55
Time(ms): 3878
java -cp . Test 100000000  3.99s user 0.02s system 101% cpu 3.945 total

Also using sjk we can see that the Corretto version is allocating close to 900 MiBps:

$ sjk ttop -o ALLOC -p $(pgrep -f Test)
2019-08-03T23:38:31.019-0700 Process summary 
  process cpu=107.39%
  application cpu=101.04% (user=99.38% sys=1.65%)
  other: cpu=6.35% 
  thread count: 12
  GC time=0.23% (young=0.23%, old=0.00%)
  heap allocation rate 842mb/s
  safe point rate: 1.6 (events/s) avg. safe point pause: 1.65ms
  safe point sync time: 0.01% processing time: 0.25% (wallclock time)
[000001] user=98.21% sys= 1.50% alloc=  842mb/s - main
[000016] user= 1.17% sys= 0.01% alloc=  324kb/s - RMI TCP Connection(1)-127.0.0.1
[000018] user= 0.00% sys= 0.13% alloc=  4461b/s - JMX server connection timeout 18
[000002] user= 0.00% sys= 0.00% alloc=     0b/s - Reference Handler
[000003] user= 0.00% sys= 0.00% alloc=     0b/s - Finalizer
[000004] user= 0.00% sys= 0.00% alloc=     0b/s - Signal Dispatcher
[000011] user= 0.00% sys= 0.00% alloc=     0b/s - ForkJoinPool.commonPool-worker-1
[000012] user= 0.00% sys= 0.00% alloc=     0b/s - ForkJoinPool.commonPool-worker-2
[000013] user= 0.00% sys= 0.01% alloc=     0b/s - Native reference cleanup thread
[000014] user= 0.00% sys= 0.00% alloc=     0b/s - Attach Listener
[000015] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-0
[000017] user= 0.00% sys= 0.00% alloc=     0b/s - RMI Scheduler(0)

vs the JDK version that allocates basically nothing:

2019-08-03T23:39:41.936-0700 Process summary 
  process cpu=104.07%
  application cpu=100.77% (user=100.42% sys=0.35%)
  other: cpu=3.30% 
  thread count: 9
  heap allocation rate 252kb/s
  safe point rate: 0.8 (events/s) avg. safe point pause: 0.12ms
  safe point sync time: 0.00% processing time: 0.01% (wallclock time)
[000013] user= 0.59% sys= 0.23% alloc=  248kb/s - RMI TCP Connection(1)-127.0.0.1
[000015] user= 0.00% sys= 0.04% alloc=  4257b/s - JMX server connection timeout 15
[000001] user=99.83% sys= 0.08% alloc=     0b/s - main
[000002] user= 0.00% sys= 0.00% alloc=     0b/s - Reference Handler
[000003] user= 0.00% sys= 0.00% alloc=     0b/s - Finalizer
[000004] user= 0.00% sys= 0.00% alloc=     0b/s - Signal Dispatcher
[000010] user= 0.00% sys= 0.00% alloc=     0b/s - Attach Listener
[000012] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-0
[000014] user= 0.00% sys= 0.00% alloc=     0b/s - RMI Scheduler(0)

JVM version information:

java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

5reactions
jolynchcommented, Sep 16, 2019

@SalusaSecondus thank you very much for the quick patch! I just rolled 1.1.1 out in our load testing Cassandra clusters and am already seeing significant improvements. It appears we’re reducing on CPU time of our digesting functions during quorum reads by up to 50% (so we’re going from 20% on CPU time to 10% on CPU time according to flamegraphs). I’ve also been able to enable AES-GCM without any noticeable increase in CPU load, which is an achievement all in itself.

0reactions
SalusaSeconduscommented, Sep 14, 2019

Version 1.1.1 released with this fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Java GZIPOutputStream appears to allocate unnecessary byte ...
In my question example, I made a more efficient method for GZIPOutputStream's write(int b) by adding a single byte buffer.
Read more >
ByteBuffer (Java Platform SE 8 ) - Oracle Help Center
Byte buffers can be created either by allocation , which allocates space for the buffer's content, or by wrapping an existing byte array...
Read more >
ByteArray - Kotlin Programming Language
Returns a single list of all elements yielded from results of transform function being invoked on each element and its index in the...
Read more >
Array Declaration and Storage Allocation - Courses
1. 2. Memory vowels names a contiguous block of 5 bytes, set to store the given values; each value is stored in a...
Read more >
MemoryOwner<T> - Windows Community Toolkit
One of the main issues of arrays returned by the ArrayPool<T> APIs and ... 1024 bytes - just a single buffer would effectively...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found