Runtime increases
See original GitHub issueType of issue: bug report
I’ve got a design based on the SiFive Freedom E310. We’re doing some customization of the platform and at some point we started noticing our FIRRTL runtimes were becoming much longer, and are now the longest pole in our simulation compile iterations. Chisel/sbt runs are still at about 40s but we’re seeing FIRRTL runtime spike from 40s to 6-8minutes.
I’ve done some digging and it appears it’s related to a bus master module we’ve written to expose TileLink master ports at the top of the resulting Verilog netlist. (Obviously for use with other things outside the chip which are mastering the SBus.) I initially thought it might be related to the updates we took to FIRRTL and rocket-chip, but varying the revisions of rocket-chip and FIRRTL appear to make no difference.
This is the Chisel for the module:
case object PeripheryMyBusMasterKey extends Field[Seq[MyBusMasterParams]]
trait HasMyBusMasterBundleContents extends Bundle {
def params: MyBusMasterParams
}
trait HasPeripheryMyBusMaster { this: BaseSubsystem =>
val busmasters = p(PeripheryMyBusMasterKey).map { params =>
val busmaster = LazyModule(new MyBusMaster(params))
// This was taken from rocket-chip/.../subsystem/Ports.scala
sbus.fromMaster(Some(params.name), buffer = BufferParams.default) {
TLSourceShrinker(1 << 14) := TLWidthWidget(busmaster.beatBytes)
} := busmaster.node
busmaster
}
}
class MyBusMaster(params: MyBusMasterParams)(implicit p: Parameters) extends LazyModule
{
val beatBytes = 4
val node = TLClientNode(Seq(TLClientPortParameters(Seq(TLClientParameters(name = params.name)))))
lazy val module = new LazyModuleImp(this) {
val (tl_out, edge) = node.out(0)
val sinksourceBits = 20 //log2Ceil(edge.manager.endSinkId)
val addressBits = log2Ceil(edge.manager.maxAddress)
// Expose the address and data tilelink channels to the outside world:
val busparams = TLBundleParameters(addressBits=addressBits,dataBits=32,sourceBits=sinksourceBits,sinkBits=sinksourceBits,sizeBits=2)
val busmaster_a_io = IO(Decoupled(new TLBundleA(busparams)).flip)
val busmaster_d_io = IO(Decoupled(new TLBundleD(busparams)))
tl_out.a <> busmaster_a_io
busmaster_d_io <> tl_out.d
// // Tie off unused channels
tl_out.b.valid := Bool(false)
tl_out.c.ready := Bool(true)
tl_out.e.ready := Bool(true)
}
}
trait HasPeripheryMyBusMasterModuleImp extends LazyModuleImp {
val outer: HasPeripheryMyBusMaster
}
Further on up the stack (in Platform.scala and System.scala) we expose the busmaster_a/d_io bundles to the outside world.
The FIRRTL module generated from one of these modules (we have a couple) looks very basic:
module MyBusMaster :
input clock : Clock
input reset : UInt<1>
output auto : {out : {a : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<3>, source : UInt<1>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}, flip b : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<3>, source : UInt<1>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}, c : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<3>, source : UInt<1>, address : UInt<32>, data : UInt<32>, corrupt : UInt<1>}}, flip d : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<3>, source : UInt<1>, sink : UInt<1>, denied : UInt<1>, data : UInt<32>, corrupt : UInt<1>}}, e : {flip ready : UInt<1>, valid : UInt<1>, bits : {sink : UInt<1>}}}}
input busmaster_a_io : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<2>, source : UInt<20>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}
output busmaster_d_io : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<2>, source : UInt<20>, sink : UInt<20>, denied : UInt<1>, data : UInt<32>, corrupt : UInt<1>}}
clock is invalid
reset is invalid
auto is invalid
busmaster_a_io is invalid
busmaster_d_io is invalid
wire tl_out : {a : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<3>, source : UInt<1>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}, flip b : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<3>, source : UInt<1>, address : UInt<32>, mask : UInt<4>, data : UInt<32>, corrupt : UInt<1>}}, c : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<3>, size : UInt<3>, source : UInt<1>, address : UInt<32>, data : UInt<32>, corrupt : UInt<1>}}, flip d : {flip ready : UInt<1>, valid : UInt<1>, bits : {opcode : UInt<3>, param : UInt<2>, size : UInt<3>, source : UInt<1>, sink : UInt<1>, denied : UInt<1>, data : UInt<32>, corrupt : UInt<1>}}, e : {flip ready : UInt<1>, valid : UInt<1>, bits : {sink : UInt<1>}}} @[Nodes.scala 332:76]
tl_out is invalid @[Nodes.scala 332:76]
auto.out <- tl_out @[LazyModule.scala 173:49]
tl_out.a <- busmaster_a_io @[MyBusMaster.scala 52:14]
busmaster_d_io <- tl_out.d @[MyBusMaster.scala 53:20]
tl_out.b.valid <= UInt<1>("h00") @[MyBusMaster.scala 56:20]
tl_out.c.ready <= UInt<1>("h01") @[MyBusMaster.scala 57:20]
tl_out.e.ready <= UInt<1>("h01") @[MyBusMaster.scala 58:20]
When I profile the FIRRTL run with and without (respectively) this module included I see the following:
(with the problematic module included)
CPU SAMPLES BEGIN (total = 44620) Wed Nov 7 04:49:01 2018
rank self accum count trace method
1 7.34% 7.34% 3275 304705 firrtl.Utils$.$anonfun$get_flip$2
2 6.92% 14.26% 3087 304742 firrtl.Utils$.$anonfun$get_flip$2
3 2.16% 16.42% 963 307091 scala.collection.TraversableOnce.nonEmpty
4 2.14% 18.56% 955 304746 scala.collection.TraversableOnce.nonEmpty
5 2.09% 20.64% 931 304730 scala.collection.TraversableOnce.nonEmpty
6 2.08% 22.73% 929 307019 scala.collection.TraversableOnce.nonEmpty
7 2.07% 24.80% 924 304670 scala.collection.TraversableOnce.nonEmpty
8 2.07% 26.86% 923 304428 scala.collection.TraversableOnce.nonEmpty
9 2.05% 28.92% 916 306943 scala.collection.TraversableOnce.nonEmpty
10 2.00% 30.91% 891 306946 scala.collection.TraversableOnce.nonEmpty
11 1.99% 32.91% 890 306931 scala.collection.TraversableOnce.nonEmpty
12 1.99% 34.90% 888 307007 scala.collection.TraversableOnce.nonEmpty
13 1.93% 36.83% 860 304324 scala.collection.TraversableOnce.nonEmpty
14 1.91% 38.73% 851 307023 scala.collection.TraversableOnce.nonEmpty
15 1.84% 40.58% 823 304743 scala.collection.TraversableOnce.$anonfun$foldLeft$1
16 1.81% 42.39% 807 304709 scala.collection.TraversableOnce.nonEmpty
17 1.80% 44.19% 804 304706 scala.collection.TraversableOnce.$anonfun$foldLeft$1
18 1.54% 45.73% 687 304316 scala.collection.TraversableOnce.nonEmpty
19 1.41% 47.14% 630 304745 firrtl.Utils$.$anonfun$get_flip$2
20 1.36% 48.51% 609 304334 scala.collection.mutable.ListBuffer.$plus$eq
(without the problematic module)
CPU SAMPLES BEGIN (total = 7680) Wed Nov 7 04:37:14 2018
rank self accum count trace method
1 0.99% 0.99% 76 307960 java.security.AccessController.doPrivileged
2 0.87% 1.86% 67 307971 java.security.AccessController.doPrivileged
3 0.82% 2.68% 63 304549 firrtl.Utils$.$anonfun$get_flip$2
4 0.47% 3.15% 36 304547 scala.collection.TraversableOnce.nonEmpty
5 0.39% 3.54% 30 304195 scala.collection.TraversableOnce.nonEmpty
6 0.27% 3.82% 21 302501 firrtl.passes.InferWidths$.$anonfun$run$9
7 0.26% 4.08% 20 308043 java.io.FileOutputStream.close0
8 0.25% 4.32% 19 307061 firrtl.transforms.ConstantPropagation.constPropPrim
9 0.23% 4.56% 18 305884 scala.collection.TraversableOnce.nonEmpty
10 0.23% 4.79% 18 305894 scala.collection.AbstractTraversable.<init>
11 0.23% 5.03% 18 307977 java.security.AccessController.doPrivileged
12 0.22% 5.25% 17 306785 firrtl.transforms.RemoveWires.onStmt$1
13 0.20% 5.44% 15 302507 firrtl.passes.InferWidths$.get_constraints_s$1
14 0.20% 5.64% 15 304193 scala.collection.TraversableOnce.nonEmpty
15 0.20% 5.83% 15 304225 firrtl.Utils$.$anonfun$get_flip$2
16 0.20% 6.03% 15 307048 scala.collection.mutable.HashTable.resize
17 0.20% 6.22% 15 307072 firrtl.Mappers$ExprMagnet$$anon$6.map
18 0.20% 6.42% 15 307080 scala.collection.mutable.HashTable.findOrAddEntry
19 0.18% 6.60% 14 304551 scala.collection.TraversableOnce.nonEmpty
20 0.18% 6.78% 14 305890 scala.collection.AbstractTraversable.<init>
If I’m reading this correctly, it indicates that the problematic module causes a dramatic increase in the number of calls to a handful of functions, dwarfing the amount of CPU spent on any other single thing in a normal run.
At the very least I thought this might be of interest to the FIRRT devs.
I’m going to try rewriting the master module to avoid these runtime problems, and if anyone could suggest a different way (which hopefully doesn’t trip the same runtime blowout) that’d be appreciated.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
The slowdown is still unfortunate. We are experiencing fairly long runtimes for large designs (~10min) so I can truly say I feel your pain @juliusbaxter. I hope to do some more performance improving in the nearish future, stay tuned!
Thanks @azidar it does reduce the runtime to about 2mm40s now, which is a good improvement, but is still double than without the module, which took 1m20s on my run just then.
The profiling output looks fairly different for the same input FIRRTL (after updating master, of course):
With module (2m40s):
Profile without module (1m20s):
Given this was zero effort, I think I’ll take it and not worry about rewriting my module.
Thanks again!