Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Vec.reduceTree produces suboptimal reduction tree

See original GitHub issue

Type of issue: bug report

Impact: no functional change | increase efficiency in produced verilog

What is the current behavior? Vec[Bool].reduceTree produces suboptimal reductions.

// chisel
class OrReduce(n: Int) extends Module {
  val io = IO(new Bundle {
    val in = Input(Vec(n, Bool()))
    val out = Output(Bool())
  })
  io.out := io.in.reduceTree(_ | _)
}

// generated verilog
module OrReduce(
  input   clock,
  input   reset,
  input   io_in_0,
  input   io_in_1,
  input   io_in_2,
  input   io_in_3,
  input   io_in_4,
  input   io_in_5,
  input   io_in_6,
  input   io_in_7,
  output  io_out
);
  assign io_out = io_in_0 | io_in_1 | (io_in_2 | io_in_3) | (io_in_4 | io_in_5 | (io_in_6 | io_in_7)); // @[OrReduce.scala 13:32]                                                                                  
endmodule

This OR reduction has 6 gates of delay.

What is the expected behavior? The produced reduction should be as follows:

assign io_out = ((io_in_0 | io_in_1) | (io_in_2 | io_in_3)) | ((io_in_4 | io_in_5) | (io_in_6 | io_in_7));

This has only 3 gate delays.

Please tell us about your environment: chisel = 3.4.3 scala = 2.12.13

What is the use case for changing the behavior? Bitwise reductions are very common. If a function name implies the algorithm used for reduction (i.e., reduceTree) then the produced hardware should match the named algorithm.

Issue Analytics

State:
Created 2 years ago
Comments:11 (10 by maintainers)

Top GitHub Comments

3reactions

jackkoenigcommented, Nov 19, 2021

I think it would be better if reduceTree did not exist.

Possibly, sometimes it is useful for the user to have more control over exactly how the Verilog looks, although sometimes it is an antipattern.

Is there a case where not having a tree is better than having a tree? Maybe reduce() should always produce a tree.

Yes, we have measured in the past that leaving things in a simpler, flat form sometimes results in better synthesis QoR (when compared to emitting a tree structure). I’d suggest reading @seldridge’s link to the discussion about mux trees in full, but in particular I’d suggest reading this comment.

Put short, many synthesis tools do a very good job and trying to do too much in the Verilog can actually hurt the QoR.

1reaction

schoeberlcommented, Nov 23, 2021

I would like to keep reduceTree. It is not only for MUXes or simple gates. You can also have a tree of more complex circuits (e.g., an arbitration tree including a register at each node). This is probably an (almost) impossible thing a synthesis tool can infer from a plain reduce.