Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[3.3.0] Performance regression

See original GitHub issue

This issue was originally created in fs2, so version numbers below refer to fs2.

I’ve observed a performance degradation in some scenarios on 3.2.3 version.

The throughput on small byte streams remained the same. The throughput on bigger byte streams is decreased roughly by ~20%.
The memory allocation decreased by 10-15% on bigger byte streams.

See the benchmarks below for more details.

Stream usage

The project utilizes a TCP socket from the fs2-io module. I cannot share many details due to NDA, but the generalized usage of the socket is the following one:

val streamDecoder: StreamDecoder[Structure] =
  StreamDecoder.many(StructureDecoder)

socket.reads
  .through(streamDecoder.toPipeByte)
  .evalMap(structure => queue.offer(Right(structure)))
  .compile
  .background

Consumed bytes per single invocation:

createOne: 167 bytes
returnRandomUUID: 135 bytes
return100Record: 2683 bytes

3.2.2

Operation average time:

Benchmark                           Mode  Cnt   Score        Error   Units
DriverBenchmark.createOne           avgt   25   4.725 ±      0.396   ms/op
DriverBenchmark.return100Records    avgt   25   8.613 ±      0.488   ms/op
DriverBenchmark.returnRandomUUID    avgt   25   3.542 ±      1.156   ms/op

Memory allocation:

DriverBenchmark.return100Records:·gc.alloc.rate                        201.323 ±     11.576  MB/sec
DriverBenchmark.return100Records:·gc.alloc.rate.norm               3117144.635 ±  13036.746    B/op
DriverBenchmark.return100Records:·gc.churn.G1_Eden_Space               199.083 ±     23.093  MB/sec
DriverBenchmark.return100Records:·gc.churn.G1_Eden_Space.norm      3076695.652 ± 283692.848    B/op
DriverBenchmark.return100Records:·gc.churn.G1_Old_Gen                    0.074 ±      0.113  MB/sec
DriverBenchmark.return100Records:·gc.churn.G1_Old_Gen.norm            1266.657 ±   1981.461    B/op
DriverBenchmark.return100Records:·gc.churn.G1_Survivor_Space             0.484 ±      0.444  MB/sec
DriverBenchmark.return100Records:·gc.churn.G1_Survivor_Space.norm     7614.493 ±   6998.312    B/op
DriverBenchmark.return100Records:·gc.count                              72.000               counts
DriverBenchmark.return100Records:·gc.time                              300.000                   ms

3.2.3

Operation average time:

Benchmark                           Mode  Cnt   Score        Error   Units
DriverBenchmark.createOne           avgt   25   4.862 ±      0.414   ms/op
DriverBenchmark.return100Records    avgt   25  11.008 ±      0.356   ms/op
DriverBenchmark.returnRandomUUID    avgt   25   3.068 ±      0.299   ms/op

Memory allocation:

DriverBenchmark.return100Records:·gc.alloc.rate                        169.961 ±      6.376  MB/sec
DriverBenchmark.return100Records:·gc.alloc.rate.norm               3383113.516 ±  12432.017    B/op
DriverBenchmark.return100Records:·gc.churn.G1_Eden_Space               175.288 ±     24.073  MB/sec
DriverBenchmark.return100Records:·gc.churn.G1_Eden_Space.norm      3484911.125 ± 432781.648    B/op
DriverBenchmark.return100Records:·gc.churn.G1_Old_Gen                    0.061 ±      0.075  MB/sec
DriverBenchmark.return100Records:·gc.churn.G1_Old_Gen.norm            1224.015 ±   1529.533    B/op
DriverBenchmark.return100Records:·gc.churn.G1_Survivor_Space             0.484 ±      0.566  MB/sec
DriverBenchmark.return100Records:·gc.churn.G1_Survivor_Space.norm     9489.291 ±  11071.370    B/op
DriverBenchmark.return100Records:·gc.count                              56.000               counts
DriverBenchmark.return100Records:·gc.time                              433.000                   ms

Issue Analytics

State:
Created 2 years ago
Comments:45 (45 by maintainers)

Top GitHub Comments

2reactions

nikiforocommented, Dec 8, 2021

                                                             Benchmark         3.2.9-        3.2.9+        3.3.0-        3.3.0+        3.2.9-        3.2.9+        3.3.0-        3.3.0+
                                      DriverBenchmark.return100Records          7.010         7.986         7.792        10.010         7.325         8.001         7.662        10.424
                       DriverBenchmark.return100Records:·gc.alloc.rate        246.768       217.543       220.893       178.274       249.123       217.396       237.331       182.919
                  DriverBenchmark.return100Records:·gc.alloc.rate.norm    3068430.610   3097923.212   3065325.686   3185210.074   3244986.275   3197155.595   3219401.063   3410284.489
              DriverBenchmark.return100Records:·gc.churn.G1_Eden_Space        245.038       215.997       221.037       176.030       251.110       218.988       241.757       182.465
         DriverBenchmark.return100Records:·gc.churn.G1_Eden_Space.norm    3051556.639   3074210.574   3065786.971   3143461.596   3270291.158   3362897.380   3283962.366   3402766.732
                 DriverBenchmark.return100Records:·gc.churn.G1_Old_Gen          0.052         0.031         0.063         0.051         0.036         0.027         0.024         0.143
            DriverBenchmark.return100Records:·gc.churn.G1_Old_Gen.norm        705.673       454.083       943.848       910.347       513.287       437.723       346.730      2694.292
          DriverBenchmark.return100Records:·gc.churn.G1_Survivor_Space          0.419         0.467         0.421         0.488         0.371         0.467         0.350         0.768
     DriverBenchmark.return100Records:·gc.churn.G1_Survivor_Space.norm       5707.785      6855.137      6075.865      8868.587      5275.698      7273.963      5005.121     14438.726
                            DriverBenchmark.return100Records:·gc.count         80.000        77.000        80.000        58.000        87.000        77.000        80.000        61.000
                             DriverBenchmark.return100Records:·gc.time        275.000       282.000       281.000       408.000       255.000       299.000       284.000       409.000

2reactions

djspiewakcommented, Dec 8, 2021

Looking at the time Delta, this seems about in line with what I would expect for tracing code which is mostly compute bound: about a 25% difference. Fully compute bound would be about 30%, more than likely.

The really interesting thing here is that GC time though. The tracing benchmark hit fewer GC iterations but still took almost twice as long. I wonder if that means that we could optimize tracing a bit further in practice by streamlining GC costs?

Edit: actually that shift only happened in 3.3.0, so it’s almost certainly being caused by the weak bag shenanigans. We’re forcing the GC to work harder in order to avoid bogging down the critical path.