Vec.reduceTree produces suboptimal reduction tree
See original GitHub issueType of issue: bug report
Impact: no functional change | increase efficiency in produced verilog
What is the current behavior? Vec[Bool].reduceTree produces suboptimal reductions.
// chisel
class OrReduce(n: Int) extends Module {
val io = IO(new Bundle {
val in = Input(Vec(n, Bool()))
val out = Output(Bool())
})
io.out := io.in.reduceTree(_ | _)
}
// generated verilog
module OrReduce(
input clock,
input reset,
input io_in_0,
input io_in_1,
input io_in_2,
input io_in_3,
input io_in_4,
input io_in_5,
input io_in_6,
input io_in_7,
output io_out
);
assign io_out = io_in_0 | io_in_1 | (io_in_2 | io_in_3) | (io_in_4 | io_in_5 | (io_in_6 | io_in_7)); // @[OrReduce.scala 13:32]
endmodule
This OR reduction has 6 gates of delay.
What is the expected behavior? The produced reduction should be as follows:
assign io_out = ((io_in_0 | io_in_1) | (io_in_2 | io_in_3)) | ((io_in_4 | io_in_5) | (io_in_6 | io_in_7));
This has only 3 gate delays.
Please tell us about your environment: chisel = 3.4.3 scala = 2.12.13
What is the use case for changing the behavior? Bitwise reductions are very common. If a function name implies the algorithm used for reduction (i.e., reduceTree) then the produced hardware should match the named algorithm.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (10 by maintainers)
Top Results From Across the Web
Mux Trees · Issue #1199 · chipsalliance/chisel3 - GitHub
I'm fine with that line of reasoning. However, it does seem odd that the standard library is producing these skinny, sub-optimal mux trees....
Read more >Reduction Trees - UCR CS
Use a reduction tree to summarize the results from each chunk into the final answer ... Replicate the output location so that each...
Read more >How to add elements in Vec like a binary tree's leaf nodes?
Vec has a reduceTree method that will do what you want: class Example extends MultiIOModule { val in = IO(Input(Vec(8, UInt(8.
Read more >learn-branch.pdf - CMU School of Computer Science
Via experiments, we show that learning an optimal weighting of partitioning procedures can dramatically reduce tree size, and we prove that this ...
Read more >Sorting Shapes the Performance of Graph-Structured Systems
it creates partitions. In Section 3.4 we go into detail on the distributed tree reduction step and give some theory and intuitions governing...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Possibly, sometimes it is useful for the user to have more control over exactly how the Verilog looks, although sometimes it is an antipattern.
Yes, we have measured in the past that leaving things in a simpler, flat form sometimes results in better synthesis QoR (when compared to emitting a tree structure). I’d suggest reading @seldridge’s link to the discussion about mux trees in full, but in particular I’d suggest reading this comment.
Put short, many synthesis tools do a very good job and trying to do too much in the Verilog can actually hurt the QoR.
I would like to keep reduceTree. It is not only for MUXes or simple gates. You can also have a tree of more complex circuits (e.g., an arbitration tree including a register at each node). This is probably an (almost) impossible thing a synthesis tool can infer from a plain reduce.