question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Runtime increases

See original GitHub issue

Type of issue: bug report

I’ve got a design based on the SiFive Freedom E310. We’re doing some customization of the platform and at some point we started noticing our FIRRTL runtimes were becoming much longer, and are now the longest pole in our simulation compile iterations. Chisel/sbt runs are still at about 40s but we’re seeing FIRRTL runtime spike from 40s to 6-8minutes.

I’ve done some digging and it appears it’s related to a bus master module we’ve written to expose TileLink master ports at the top of the resulting Verilog netlist. (Obviously for use with other things outside the chip which are mastering the SBus.) I initially thought it might be related to the updates we took to FIRRTL and rocket-chip, but varying the revisions of rocket-chip and FIRRTL appear to make no difference.

This is the Chisel for the module:

case object PeripheryMyBusMasterKey extends Field[Seq[MyBusMasterParams]]
trait HasMyBusMasterBundleContents extends Bundle {
  def params: MyBusMasterParams
}

trait HasPeripheryMyBusMaster { this: BaseSubsystem =>
  val busmasters = p(PeripheryMyBusMasterKey).map { params =>
    val busmaster = LazyModule(new MyBusMaster(params))
    // This was taken from rocket-chip/.../subsystem/Ports.scala
    sbus.fromMaster(Some(params.name), buffer = BufferParams.default) {
      TLSourceShrinker(1 << 14) := TLWidthWidget(busmaster.beatBytes)
    } := busmaster.node
    busmaster
  }
}

class MyBusMaster(params: MyBusMasterParams)(implicit p: Parameters)  extends LazyModule
{
  val beatBytes = 4
  val node = TLClientNode(Seq(TLClientPortParameters(Seq(TLClientParameters(name = params.name)))))
  lazy val module = new LazyModuleImp(this) {
    val (tl_out, edge) = node.out(0)
    val sinksourceBits = 20 //log2Ceil(edge.manager.endSinkId)
    val addressBits = log2Ceil(edge.manager.maxAddress)
    // Expose the address and data tilelink channels to the outside world:
    val busparams = TLBundleParameters(addressBits=addressBits,dataBits=32,sourceBits=sinksourceBits,sinkBits=sinksourceBits,sizeBits=2)
    val busmaster_a_io = IO(Decoupled(new TLBundleA(busparams)).flip)
    val busmaster_d_io = IO(Decoupled(new TLBundleD(busparams)))
    tl_out.a <> busmaster_a_io
    busmaster_d_io <> tl_out.d
    // // Tie off unused channels
    tl_out.b.valid := Bool(false)
    tl_out.c.ready := Bool(true)
    tl_out.e.ready := Bool(true)
  }
}

trait HasPeripheryMyBusMasterModuleImp extends LazyModuleImp {
  val outer: HasPeripheryMyBusMaster
}

Further on up the stack (in Platform.scala and System.scala) we expose the busmaster_a/d_io bundles to the outside world.

The FIRRTL module generated from one of these modules (we have a couple) looks very basic:

  module MyBusMaster :
    input clock : Clock
    input reset : UInt<1>
    output auto : {out : {a : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<3>, source : UInt<1>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}, flip b : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<3>, source : UInt<1>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}, c : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<3>, source : UInt<1>, address : UInt<32>, data : UInt<32>, corrupt : UInt<1>}}, flip d : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<3>, source : UInt<1>, sink : UInt<1>, denied : UInt<1>, data : UInt<32>, corrupt : UInt<1>}}, e : {flip ready : UInt<1>, valid : UInt<1>, bits : {sink : UInt<1>}}}}
    input busmaster_a_io : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<2>, source : UInt<20>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}
    output busmaster_d_io : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<2>, source : UInt<20>, sink : UInt<20>, denied : UInt<1>, data : UInt<32>, corrupt : UInt<1>}}

    clock is invalid
    reset is invalid
    auto is invalid
    busmaster_a_io is invalid
    busmaster_d_io is invalid
    wire tl_out : {a : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<3>, source : UInt<1>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}, flip b : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<3>, source : UInt<1>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}, c : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<3>, source : UInt<1>, address : UInt<32>, data : UInt<32>, corrupt : UInt<1>}}, flip d : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<3>, source : UInt<1>, sink : UInt<1>, denied : UInt<1>, data : UInt<32>, corrupt : UInt<1>}}, e : {flip ready : UInt<1>, valid : UInt<1>, bits : {sink : UInt<1>}}} @[Nodes.scala 332:76]
    tl_out is invalid @[Nodes.scala 332:76]
    auto.out <- tl_out @[LazyModule.scala 173:49]
    tl_out.a <- busmaster_a_io @[MyBusMaster.scala 52:14]
    busmaster_d_io <- tl_out.d @[MyBusMaster.scala 53:20]
    tl_out.b.valid <= UInt<1>("h00") @[MyBusMaster.scala 56:20]
    tl_out.c.ready <= UInt<1>("h01") @[MyBusMaster.scala 57:20]
    tl_out.e.ready <= UInt<1>("h01") @[MyBusMaster.scala 58:20]

When I profile the FIRRTL run with and without (respectively) this module included I see the following:

(with the problematic module included)

CPU SAMPLES BEGIN (total = 44620) Wed Nov  7 04:49:01 2018
rank   self  accum   count trace method
   1  7.34%  7.34%    3275 304705 firrtl.Utils$.$anonfun$get_flip$2
   2  6.92% 14.26%    3087 304742 firrtl.Utils$.$anonfun$get_flip$2
   3  2.16% 16.42%     963 307091 scala.collection.TraversableOnce.nonEmpty
   4  2.14% 18.56%     955 304746 scala.collection.TraversableOnce.nonEmpty
   5  2.09% 20.64%     931 304730 scala.collection.TraversableOnce.nonEmpty
   6  2.08% 22.73%     929 307019 scala.collection.TraversableOnce.nonEmpty
   7  2.07% 24.80%     924 304670 scala.collection.TraversableOnce.nonEmpty
   8  2.07% 26.86%     923 304428 scala.collection.TraversableOnce.nonEmpty
   9  2.05% 28.92%     916 306943 scala.collection.TraversableOnce.nonEmpty
  10  2.00% 30.91%     891 306946 scala.collection.TraversableOnce.nonEmpty
  11  1.99% 32.91%     890 306931 scala.collection.TraversableOnce.nonEmpty
  12  1.99% 34.90%     888 307007 scala.collection.TraversableOnce.nonEmpty
  13  1.93% 36.83%     860 304324 scala.collection.TraversableOnce.nonEmpty
  14  1.91% 38.73%     851 307023 scala.collection.TraversableOnce.nonEmpty
  15  1.84% 40.58%     823 304743 scala.collection.TraversableOnce.$anonfun$foldLeft$1
  16  1.81% 42.39%     807 304709 scala.collection.TraversableOnce.nonEmpty
  17  1.80% 44.19%     804 304706 scala.collection.TraversableOnce.$anonfun$foldLeft$1
  18  1.54% 45.73%     687 304316 scala.collection.TraversableOnce.nonEmpty
  19  1.41% 47.14%     630 304745 firrtl.Utils$.$anonfun$get_flip$2
  20  1.36% 48.51%     609 304334 scala.collection.mutable.ListBuffer.$plus$eq

(without the problematic module)

CPU SAMPLES BEGIN (total = 7680) Wed Nov  7 04:37:14 2018
rank   self  accum   count trace method
   1  0.99%  0.99%      76 307960 java.security.AccessController.doPrivileged
   2  0.87%  1.86%      67 307971 java.security.AccessController.doPrivileged
   3  0.82%  2.68%      63 304549 firrtl.Utils$.$anonfun$get_flip$2
   4  0.47%  3.15%      36 304547 scala.collection.TraversableOnce.nonEmpty
   5  0.39%  3.54%      30 304195 scala.collection.TraversableOnce.nonEmpty
   6  0.27%  3.82%      21 302501 firrtl.passes.InferWidths$.$anonfun$run$9
   7  0.26%  4.08%      20 308043 java.io.FileOutputStream.close0
   8  0.25%  4.32%      19 307061 firrtl.transforms.ConstantPropagation.constPropPrim
   9  0.23%  4.56%      18 305884 scala.collection.TraversableOnce.nonEmpty
  10  0.23%  4.79%      18 305894 scala.collection.AbstractTraversable.<init>
  11  0.23%  5.03%      18 307977 java.security.AccessController.doPrivileged
  12  0.22%  5.25%      17 306785 firrtl.transforms.RemoveWires.onStmt$1
  13  0.20%  5.44%      15 302507 firrtl.passes.InferWidths$.get_constraints_s$1
  14  0.20%  5.64%      15 304193 scala.collection.TraversableOnce.nonEmpty
  15  0.20%  5.83%      15 304225 firrtl.Utils$.$anonfun$get_flip$2
  16  0.20%  6.03%      15 307048 scala.collection.mutable.HashTable.resize
  17  0.20%  6.22%      15 307072 firrtl.Mappers$ExprMagnet$$anon$6.map
  18  0.20%  6.42%      15 307080 scala.collection.mutable.HashTable.findOrAddEntry
  19  0.18%  6.60%      14 304551 scala.collection.TraversableOnce.nonEmpty
  20  0.18%  6.78%      14 305890 scala.collection.AbstractTraversable.<init>

If I’m reading this correctly, it indicates that the problematic module causes a dramatic increase in the number of calls to a handful of functions, dwarfing the amount of CPU spent on any other single thing in a normal run.

At the very least I thought this might be of interest to the FIRRT devs.

I’m going to try rewriting the master module to avoid these runtime problems, and if anyone could suggest a different way (which hopefully doesn’t trip the same runtime blowout) that’d be appreciated.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jackkoenigcommented, Nov 7, 2018

The slowdown is still unfortunate. We are experiencing fairly long runtimes for large designs (~10min) so I can truly say I feel your pain @juliusbaxter. I hope to do some more performance improving in the nearish future, stay tuned!

1reaction
juliusbaxtercommented, Nov 27, 2018

Thanks @azidar it does reduce the runtime to about 2mm40s now, which is a good improvement, but is still double than without the module, which took 1m20s on my run just then.

The profiling output looks fairly different for the same input FIRRTL (after updating master, of course):

With module (2m40s):

 CPU SAMPLES BEGIN (total = 18730) Wed Nov  7 05:56:23 2018
 rank   self  accum   count trace method
    1  1.43%  1.43%     268 311123 scala.collection.IndexedSeqOptimized.prefixLengthImpl
    2  1.32%  2.75%     247 311144 java.util.regex.Pattern$5.isSatisfiedBy
    3  1.25%  4.00%     235 304509 scala.collection.TraversableOnce.nonEmpty
    4  1.20%  5.21%     225 311125 java.util.regex.Pattern$5.isSatisfiedBy
    5  1.20%  6.40%     224 311124 scala.collection.IndexedSeqOptimized.prefixLengthImpl
    6  1.16%  7.57%     218 311120 java.util.regex.Pattern$5.isSatisfiedBy
    7  1.11%  8.68%     208 304495 scala.collection.TraversableOnce.nonEmpty
    8  1.11%  9.79%     208 304520 scala.collection.TraversableOnce.nonEmpty
    9  1.09% 10.88%     204 311116 java.util.regex.Pattern$5.isSatisfiedBy
   10  1.02% 11.90%     191 304533 scala.collection.TraversableOnce.nonEmpty
   11  0.74% 12.64%     139 304532 scala.collection.AbstractTraversable.<init>
   12  0.68% 13.32%     127 304512 scala.collection.AbstractTraversable.<init>
   13  0.68% 13.99%     127 311582 java.security.AccessController.doPrivileged
   14  0.60% 14.59%     112 304494 scala.collection.AbstractTraversable.<init>
   15  0.59% 15.18%     111 304521 scala.collection.AbstractTraversable.<init>
   16  0.53% 15.72%     100 311592 java.security.AccessController.doPrivileged
   17  0.34% 16.06%      64 311145 java.util.regex.Pattern$CharProperty.match
   18  0.34% 16.40%      63 311131 java.util.regex.Pattern$CharProperty.match
   19  0.28% 16.67%      52 311132 java.util.regex.Pattern$CharProperty.match
   20  0.25% 16.92%      47 311146 scala.collection.IndexedSeqOptimized.segmentLength

Profile without module (1m20s):

 CPU SAMPLES BEGIN (total = 10168) Wed Nov  7 06:04:51 2018
 rank   self  accum   count trace method
    1  2.24%  2.24%     228 308361 java.util.regex.Pattern$5.isSatisfiedBy
    2  2.15%  4.40%     219 308367 java.util.regex.Pattern$5.isSatisfiedBy
    3  1.97%  6.36%     200 308370 java.util.regex.Pattern$5.isSatisfiedBy
    4  1.93%  8.29%     196 308359 java.util.regex.Pattern$5.isSatisfiedBy
    5  1.66%  9.95%     169 308382 scala.collection.IndexedSeqOptimized.prefixLengthImpl
    6  1.47% 11.42%     149 308399 scala.collection.IndexedSeqOptimized.prefixLengthImpl
    7  0.91% 12.33%      93 308387 scala.collection.IndexedSeqOptimized.segmentLength
    8  0.79% 13.12%      80 308362 scala.collection.IndexedSeqOptimized.segmentLength
    9  0.62% 13.74%      63 308799 java.security.AccessController.doPrivileged
   10  0.55% 14.29%      56 308791 java.security.AccessController.doPrivileged
   11  0.43% 14.72%      44 308388 java.util.regex.Pattern$CharProperty.match
   12  0.34% 15.07%      35 308384 java.util.regex.Pattern$CharProperty.match
   13  0.33% 15.40%      34 307464 firrtl.transforms.ConstantPropagation.constPropPrim
   14  0.33% 15.74%      34 308402 java.util.regex.Pattern$CharProperty.match
   15  0.29% 16.02%      29 308404 java.util.regex.Pattern$5.isSatisfiedBy
   16  0.27% 16.29%      27 308391 java.util.regex.Pattern$CharProperty.match
   17  0.25% 16.53%      25 308357 scala.collection.generic.Growable.$anonfun$$plus$plus$eq$1
   18  0.24% 16.77%      24 308397 java.util.regex.Pattern$5.isSatisfiedBy
   19  0.21% 16.97%      21 308386 scala.collection.generic.Growable.$anonfun$$plus$plus$eq$1
   20  0.20% 17.17%      20 308433 java.util.regex.Pattern$5.isSatisfiedBy

Given this was zero effort, I think I’ll take it and not worry about rewriting my module.

Thanks again!

Read more comments on GitHub >

github_iconTop Results From Across the Web

The runtime increases and then decreases as the number of ...
The runtime increases and then decreases as the number of phases increases.
Read more >
icc: increase in runtime in one part while changing a different ...
I was playing around with compiler flags a little bit. When stating -fpic (Determines whether the compiler generates position-independent code) ...
Read more >
Runtime of Algorithms Based on the Input Size - Netguru
In this case, our runtime will increase proportionally to how much our input increases, and we mark it as O(n).
Read more >
Are there any practical algorithms whose runtime decreases ...
YES , there is an existing practical algorithm , whose runtime decreases as the input size increases. Our very own "Boyre-Moore-Horsepool" algorithm for ......
Read more >
Furman Battery Extension Pack For F1500-UPS and MB1500 ...
Furman Battery Extension Pack For F1500-UPS and MB1500, Increases Backup Runtime By 2 To 4 Times, BATT1500-EXT. Guaranteed best price on Furman Power ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found