Need magic incantation to prevent synthesizer misoptimisation

Started by Aleksandar Kuktin September 22, 2018
Hi all!

I'm having a problem with the synthesis and P&R tools introducing a 
unnecessary gate in a critical path. Consider the following verilog:

reg [31:0]          mem_dataintomem = 32'd0;

always @(posedge CLK)
  begin
    if (mcu_active && (w_we_recv || w_tlb_recv))
      mem_dataintomem <= w_data_recv;
    if (mem_ack && !mcu_active)
      mem_dataintomem <= 0;
  end

Practically, the synthesis correctly implements mem_dataintomem as a 
register whose data input is ungated w_data_recv, and whose clock enable 
and synchronous reset control the contents of the register.

In this setup, for speed considerations, the best thing to do is create a 
single gate signal to the clock enable input of a register, and another 
separate single gate signal to drive the synchronous reset of the same 
register.

But instead, the tools combine the signals, so that there is one single-
gate signal that feeds the synchronous reset, but is also piped into the 
other signal (which therefore becomes a two-gate signal). Sort of like 
this:

assign ctrl_syn_reset = mem_ack && !mcu_active;
assign ctrl_clock_enable = mcu_active &&
                           (w_we_recv || w_tlb_recv) &&
                           !ctrl_syn_reset; // this is superfluous
register_module mem_dataintomem_somebit(
    .CLOCK(CLK),
    .DATAIN(w_data_recv),
    .CLOCK_ENABLE(ctrl_clock_enable),
    .RESET(ctrl_syn_reset), // ignore the drive-high-to-reset quirk
  [...]
    );

My problem is that in this setup ctrl_clock_enable becomes too slow on 
account of the extra gate and timing analysis fails.

What I would like is help figuring out some incantation that would make 
the machine see the light and not chain the signals. OTOH, is there a 
real need for this? It seems obvious to me that only one of CLOCK_ENABLE 
or RESET will ever be asserted, and that you don't have to mix them.

I know for a fact I can sidestep the problem if I remove the clock enable 
and instead gate the input to the register, but that is *very* 
problematic because that area of the die is already tightly packed and it 
can't support the extra logic (that is, all the wires are already taken).

The chip used is Lattice iCE40 HX8K, and the software used is iCEcube2 
version 2017.01.27914 . Synthesis is performed by the bundled Synplify 
Pro L-2016.09.L+ice40, build 077R.

The code is part of a FOSH project and can be inspected on
https://github.com/akuktin/special_snowflake . The code of interest for 
this question is in file cache/cpu_mcu2.v, lines approximately 303 to 
333. The code at the start of the post (without else) is one of the 
experiments I tried, and failed with.
On Saturday, 9/22/2018 9:03 PM, Aleksandar Kuktin wrote:
> Hi all! > > I'm having a problem with the synthesis and P&R tools introducing a > unnecessary gate in a critical path. Consider the following verilog: > > reg [31:0] mem_dataintomem = 32'd0; > > always @(posedge CLK) > begin > if (mcu_active && (w_we_recv || w_tlb_recv)) > mem_dataintomem <= w_data_recv; > if (mem_ack && !mcu_active) > mem_dataintomem <= 0; > end > > Practically, the synthesis correctly implements mem_dataintomem as a > register whose data input is ungated w_data_recv, and whose clock enable > and synchronous reset control the contents of the register. > > In this setup, for speed considerations, the best thing to do is create a > single gate signal to the clock enable input of a register, and another > separate single gate signal to drive the synchronous reset of the same > register. > > But instead, the tools combine the signals, so that there is one single- > gate signal that feeds the synchronous reset, but is also piped into the > other signal (which therefore becomes a two-gate signal). Sort of like > this: > > assign ctrl_syn_reset = mem_ack && !mcu_active; > assign ctrl_clock_enable = mcu_active && > (w_we_recv || w_tlb_recv) && > !ctrl_syn_reset; // this is superfluous > register_module mem_dataintomem_somebit( > .CLOCK(CLK), > .DATAIN(w_data_recv), > .CLOCK_ENABLE(ctrl_clock_enable), > .RESET(ctrl_syn_reset), // ignore the drive-high-to-reset quirk > [...] > ); > > My problem is that in this setup ctrl_clock_enable becomes too slow on > account of the extra gate and timing analysis fails. > > What I would like is help figuring out some incantation that would make > the machine see the light and not chain the signals. OTOH, is there a > real need for this? It seems obvious to me that only one of CLOCK_ENABLE > or RESET will ever be asserted, and that you don't have to mix them. > > I know for a fact I can sidestep the problem if I remove the clock enable > and instead gate the input to the register, but that is *very* > problematic because that area of the die is already tightly packed and it > can't support the extra logic (that is, all the wires are already taken). > > The chip used is Lattice iCE40 HX8K, and the software used is iCEcube2 > version 2017.01.27914 . Synthesis is performed by the bundled Synplify > Pro L-2016.09.L+ice40, build 077R. > > The code is part of a FOSH project and can be inspected on > https://github.com/akuktin/special_snowflake . The code of interest for > this question is in file cache/cpu_mcu2.v, lines approximately 303 to > 333. The code at the start of the post (without else) is one of the > experiments I tried, and failed with. >
If the synthesizer is using a simple template approach to generating the reset and clock control signal, it would probably be best to code your logic the way the template expects it, typically: always @ (posedge clk) if (reset condition) reset actions else if (enable condition) enable actions Logically this is the same as what you've written, but the template fits the expectations of the synthesizer better. -- Gabor
On Sun, 23 Sep 2018 01:03:30 +0000, Aleksandar Kuktin wrote:

> The chip used is Lattice iCE40 HX8K, [...]
Okay, I figured it out. The flip-flop is behaving differently than my expectations. Specifically, I had assumed the synchronous reset is fully independent and will have its effect if it is independently asserted during the clock edge. In actual fact, iCE40's synchronous reset only takes effect if the clock_enable is also asserted. So, to reset the flip-flop, you have to assert BOTH reset and clock_enable. Therefore, the extra wire.