Hi All,
> I *think* you have uncovered a bug in Quartus 4.1 synthesis. I'll confirm
> this with the synthesis team tomorrow.
First of all, I should point out that this is sub-optimal synthesis, NOT a
"bug" -- the design will function, it just uses more logic elements than
necessary. We *may* fix this in a future release of Quartus, but the
solution will not be easy to implement so don't hold your breath. The value
is rather limited due to the input limitations explained below, and the
relative rarity of this combination of functions.
In the meantime, there is a work-around. You can directly instantiate
"stratix_lcells" (the WYSIWYG cell for Stratix/Cyclone LEs). Below I give
the code (thanks to a helpful synthesis guy) for a registered
adder/subtractor with oodles of extras. Features:
- Implements A - B or A + B (depending on signal "addnsub")
- Registers are synchronously loadable with "data" when synchronous load
"sload" is asserted
- There is shared clock "clk", clock enable "ena", synchronous clear
"sclr", asynchronous clear "aclr"
A couple caveats:
- There are only 26 non-global inputs to each LAB in Cyclone (and 30 in
Stratix). So the fitter will have to split the design over multiple labs if
you use more than 7 bits in Cyclone, since you need 3 bits/bit (A, B,
sload_data) plus a 4 local control signals and 2 global signals. Assuming
aclr and clk are global, and the others are local, that's 4 extra signals
you need.
- When you stress the number of inputs on a LAB, you run the risk of
having reduced routability, resulting in longer run-times, poor performance,
or unroutable designs in the worst case. You should try to keep # of LAB
inputs around 22-24.
When Quartus splits the carry-chain, it must insert extra logic elements to
end the chain and begin the next. For example, to implement a 10-bit
add/sub/load/ena/aclr/sclr/sload requires 13 LEs. Still better than 20 LEs,
but not 1:1. Also, the remaining unused in the lab will not be too useful,
since the lab inputs are nearly saturated.
If you have no sload or a constant sload, you can implement 10 bits/LAB
since you only need 2n + 4 lab lines.
Hope this helps!
Paul Leventis
Altera Corp.
************************* VERILOG CODE ******************
// Thanks to Gregg Baeckler for code!
module addsub (clk,a,b,addnsub,sload,sclr,aclr,ena,data,out);
parameter WIDTH = 7;
input [WIDTH-1:0] a; // Operand A
input [WIDTH-1:0] b; // Operand B (+B or -B based on addnsub)
input [WIDTH-1:0] data; // Data to load upon sload
input clk; // Clock
input addnsub; // ADD=1, SUBTRACT=0
input sload; // Triggers synchronous load of register
input sclr; // Synchronous clear
input aclr; // Asynchronous clear
input ena; // Clock enable
output [WIDTH-1:0] out;
wire [WIDTH-1:0] out;
wire [WIDTH-1:0] cout_wires;
// The first cell CIN is special since it has no carry-in.
// Its carry-in will be the addnsub signal
stratix_lcell first_cell (
.dataa(b[0]),
.datab(a[0]),
.datac(data[0]),
.sload(sload),
.sclr(sclr),
.ena(ena),
.aclr(aclr),
.clk(clk),
.inverta(addnsub),
.regout(out[0]),
.cout(cout_wires[0])
);
defparam first_cell .operation_mode = "arithmetic";
defparam first_cell .synch_mode = "on";
defparam first_cell .sum_lutc_input = "cin";
defparam first_cell .lut_mask = "96b2";
defparam first_cell .output_mode = "reg_only";
// fill in the rest of the cells in this loop
genvar i;
generate
for (i=1; i<WIDTH; i=i+1)
begin : ads
stratix_lcell my_cell (
.dataa(b[i]),
.datab(a[i]),
.datac(data[i]),
.sload(sload),
.sclr(sclr),
.ena(ena),
.aclr(aclr),
.clk(clk),
.cin(cout_wires[i-1]),
.inverta(addnsub),
.regout(out[i]),
.cout(cout_wires[i])
);
defparam my_cell .operation_mode = "arithmetic";
defparam my_cell .synch_mode = "on";
defparam my_cell .sum_lutc_input = "cin";
defparam my_cell .lut_mask = "96b2";
defparam my_cell .output_mode = "reg_only";
end
endgenerate
endmodule