Hi, I'm trying to devise an efficient method to generate a ones' complement sum of two 16 bit integers in a single cycle. I'm targeting a Xilinx device and right now I'm using a 32 bit adder. Suppose I want to add x and y : a,b,result : std_logic_vector(15 downto 0); x,y,result_32 : std_logic_vector(31 downto 0); a <= x & y; b <= y & x; result_32 <= a + b; result <= result_32(31 downto 16); Is there a better way to implement this?
Ones' complement addition
A developer seeks an efficient way to implement 16-bit ones' complement addition in a Xilinx FPGA within a single clock cycle. The discussion highlights the dangers of using a simple combinatorial end-around carry, which can lead to unstable logic loops and metastability when operands result in zero.
Participants conclude that while several architectures exist to handle the wrap-around carry, designers must explicitly manage the loop to satisfy timing analysis tools and ensure deterministic results between positive and negative zero representations.
- Directly connecting carry-out to carry-in creates a combinatorial loop that synthesis tools cannot properly analyze and may lead to oscillation.
- Ones' complement addition has two representations for zero (all 0s and all 1s), which can cause metastability in a simple ripple-carry loop.
- Using a 32-bit adder to simulate two 16-bit stages is a reliable way to resolve the carry without combinatorial loops.
- Alternative high-performance designs include using two parallel 16-bit adders (one for A+B and one for A+B+1) with a multiplexer controlled by the first adder's carry-out.
- Historical computer architectures like the CDC 6600 used specific hardware flags or pre-conditioning to stabilize the end-around carry.
Koen Van Renterghem wrote:> Hi, > > I'm trying to devise an efficient method to generate a ones' complement > sum of two 16 bit integers in a single cycle. I'm targeting a Xilinx > device and right now I'm using a 32 bit adder. Suppose I want to add x > and y :You need the end-around carry. It is pretty easy with a carry lookahead adder to wire the carry out back to the carry in. (snip) You should only need a 16 bit adder with carry in/carry out. How about: assign {carry,result} = a+b+carry; The worst case is where the carry propagates 16 bits. (Well, the full width.) It never goes farther than that. -- glen
glen herrmannsfeldt wrote:
> Koen Van Renterghem wrote:
>
>> Hi,
>>
>> I'm trying to devise an efficient method to generate a ones' complement
>> sum of two 16 bit integers in a single cycle. I'm targeting a Xilinx
>> device and right now I'm using a 32 bit adder. Suppose I want to add x
>> and y :
>
> You need the end-around carry. It is pretty easy with a carry
> lookahead adder to wire the carry out back to the carry in.
>
> (snip)
>
> You should only need a 16 bit adder with carry in/carry out.
>
> How about:
>
> assign {carry,result} = a+b+carry;
>
> The worst case is where the carry propagates 16 bits.
> (Well, the full width.) It never goes farther than that.
>
> -- glen
>
I previously tried this approach, but that resulted in synplify and
xilinx complaining about a combinatorial loop created by connecting the
carry-out to the carry-in of the adder. That is why I started using a
dedicated 16 bit adder to calculate the carry. This carry is then
applied to a second adder.
The statement you suggested will save area, but is there a proper way to
constrain it? Right now the combinatorial loop seems to be ignored in
timing analysis :
<snip>
1 logic loop found and disabled.
----------------------------------------------------------------------
! Warning: The following connections close logic loops, and some paths !
! through these connections may not be analyzed. !
! !
! Signal Driver Load !
! -------------------------------- ---------------- ---------------- !
! csum_2x16_temp<1>_cyo LICE_X46Y82.COUT SLICE_X46Y87.BX !
----------------------------------------------------------------------
<\snip>
Koen Van Renterghem wrote:> glen herrmannsfeldt wrote:(snip regarding ones complement adder)> > You need the end-around carry. It is pretty easy with a carry > > lookahead adder to wire the carry out back to the carry in. > > (snip)> > You should only need a 16 bit adder with carry in/carry out.> > assign {carry,result} = a+b+carry;> > The worst case is where the carry propagates 16 bits. > > (Well, the full width.) It never goes farther than that.> I previously tried this approach, but that resulted in synplify and > xilinx complaining about a combinatorial loop created by connecting the > carry-out to the carry-in of the adder. That is why I started using a > dedicated 16 bit adder to calculate the carry. This carry is then > applied to a second adder.The delay will be at most one trip around the carry loop, but the tools don't know that. It might be that there is a way to tell the system the delay. You will have to include the end-around carry through the routing fabric, which will depend on the routing.> The statement you suggested will save area, but is there a proper way to > constrain it? Right now the combinatorial loop seems to be ignored in > timing analysis :It is nice that it figures out that there is a loop, and to ignore it. Unless it can specifically recognize ones complement adders, I don't think there is anything else it could do. That might be one to ask synplify. -- glen
glen herrmannsfeldt wrote:> Koen Van Renterghem wrote: > >> glen herrmannsfeldt wrote: > > (snip regarding ones complement adder) > >> > You need the end-around carry. It is pretty easy with a carry >> > lookahead adder to wire the carry out back to the carry in. (snip) > >> > You should only need a 16 bit adder with carry in/carry out. > >> > assign {carry,result} = a+b+carry; > >> > The worst case is where the carry propagates 16 bits. (Well, the >> > full width.) It never goes farther than that. > >> I previously tried this approach, but that resulted in synplify and >> xilinx complaining about a combinatorial loop created by connecting the >> carry-out to the carry-in of the adder. That is why I started using a >> dedicated 16 bit adder to calculate the carry. This carry is then >> applied to a second adder. > > The delay will be at most one trip around the carry loop, but the tools > don't know that.It looks to me like there is a chance of a pulse running around the carry loop for a multiple times, perhaps even forever. The case of interest is a calculation that should result in an answer of zero. Note that there are two representations of zero in one's complement notation, all '1's and all '0's, often called negative zero and positive zero. If there is a carry, the answer is all '0's, or positive zero. If there is no carry, the answer will be all '1's, or negative zero. Now suppose my adder is half in the state with a carry, and half in the state without a carry. The carry, and not carry will both propagate up the carry chain to the msb and around to the lsb again, chasing each other, and it is a race. If carry is faster than the not carry, then the final result will be all '0's. If not carry is faster than the carry, then the final result will be all '1's. Assuming you wait long enough, of course. If carry propagates at exactly the same speed as the not carry, then the circuit will never produce either of the two correct results. If you check before the end of the race, then you will get an incorrect result, that is something other than negative zero or positive zero. In other words, this circuit has a metastable state when the answer is zero. It may take forever to to produce a correct answer, if it must decide between negative zero and positive zero. -- Phil Hays (Xilinx, but posting for myself)
Phil Hays wrote: (snip regarding carry and ones complement adders)> The case of interest is a calculation that should result in an answer of > zero. Note that there are two representations of zero in one's complement > notation, all '1's and all '0's, often called negative zero and positive > zero.> If there is a carry, the answer is all '0's, or positive zero. If there is > no carry, the answer will be all '1's, or negative zero.If this were really a problem I don't believe Cray would have built ones complement machines when he was trying to build them as fast as possible. As far as I know (never having actually tried to build a ones complement machine) it is more usual to use a subtractor than adder. The reason I always thought it was done was to reduce the negative zero results. Does a subtractor still have this effect? Also, a Xilinx implementation will have the wraparound carry much slower than the others. Is real logic symmetric in propagation time for zero and one? -- glen
glen herrmannsfeldt wrote:> Phil Hays wrote: > > (snip regarding carry and ones complement adders) > >> The case of interest is a calculation that should result in an answer >> of zero. Note that there are two representations of zero in one's >> complement notation, all '1's and all '0's, often called negative zero >> and positive zero. > >> If there is a carry, the answer is all '0's, or positive zero. If there >> is no carry, the answer will be all '1's, or negative zero.> If this were really a problem I don't believe Cray would have built ones > complement machines when he was trying to build them as fast as > possible.There is a reason why ones complement isn't done on recent computers. Even Cray didn't use it in his later machines. For example, a google search found me this: http://docs.cray.com/books/004-2179-001/html-004-2179-001/rvc5mrwh.html There are solutions to this problem. From example, hold the last carry until there is a stable next carry. The carry was faster than the sum, as it was generated by a carry lookahead, and the sum required carry ripple between several bits. From memory cells not refreshed since about 1975 or so, the CDC6600 did something like this. I just wrote some simple programs on one, perhaps someone who knows more about the details will comment.> As far as I know (never having actually tried to build a ones complement > machine) it is more usual to use a subtractor than adder. The reason I > always thought it was done was to reduce the negative zero results. Does > a subtractor still have this effect?Why would a subtractor produce more or less negative zero results?> Also, a Xilinx implementation will have the wraparound carry much slower > than the others.Which doesn't help. It is still a loop.> Is real logic symmetric in propagation time for zero and one?It can be. CMOS logic is inverting. A carry value of "zero" or "one" must be both a physical low and a physical high at different stages of the circuit. While the high to low transition may be faster than the low to high transition, the next stage is the inverse, so some or all of the difference cancels out. -- Phil Hays (Xilinx, but speaking for myself)
Phil Hays wrote about a one's complement adder, which uses end-around carry:> It looks to me like there is a chance of a pulse running around the carry > loop for a multiple times, perhaps even forever.There is no way that can happen if the inputs are stable. For an n-bit one's complement adder, there can't be a carry propogation wider than n bits. In other words, in a 16-bit adder, a carry propogation might start in position 5, but even if it wraps it cannot cause carry propogation past position 4. You can prove this by exhaustion for a short word width, and then by induction for longer word widths. The worst-case add time for an n-bit one's complement ripple carry adder is exactly the same as the worst-case for an n-bit two's complement ripple carry adder. This is why there wasn't a problem with using one's complement in various early computers, including the early IBM mainframes, DEC PDP-1, and CDC 6600.> The case of interest is a calculation that should result in an answer of > zero. Note that there are two representations of zero in one's complement > notation, all '1's and all '0's, often called negative zero and positive > zero. > > If there is a carry, the answer is all '0's, or positive zero. If there is > no carry, the answer will be all '1's, or negative zero.If you're suggesting that there is a case in which the end-around carry won't produce the numerically correct result (assuming that the numerical values of "positive zero" and "negative zero" are equal), you are mistaken.> Now suppose my adder is half in the state with a carry, and half in the > state without a carry. The carry, and not carry will both propagate up the > carry chain to the msb and around to the lsb again, chasing each other, > and it is a race.There is no combination of operands for which that can happen. For the sum to be zero (whether represented as "positive zero" or "negative zero", the operands must be n and -n. But if that is true, then the bit patterns representing the operands are bitwise-complementary, so there will be no carries at all. For example, suppose you add 3 and -3 with a four-bit one's complement adder: 0011 1100 ---- 1111 There is no way for any carry to occur in such a case; the result is "negative zero". Similarly for adding "positive zero" and "negative zero". If you add "positive zero" to itself, there are no carries, and the result is "positive zero". If you add "negative zero" to itself, you get: 1111 1111 ---- 11110 <- before end-around carry ---- 1111 <- after end-around carry Thus the final result is "negative zero".> In other words, this circuit has a metastable state when the answer is > zero.If you still think this is possible, please give an example of operands for which it could occur. Eric
Eric Smith wrote:> Phil Hays wrote about a one's complement adder, which uses end-around > carry: >> It looks to me like there is a chance of a pulse running around the >> carry loop for a multiple times, perhaps even forever. > > There is no way that can happen if the inputs are stable. For an n-bit > one's complement adder, there can't be a carry propogation wider than n > bits. <Snip> please give an example of operands for which it could > occur.Let me give an example. Note that the two inputs can be anything that are inverses of each other. The first two cases are stable. 1111 +0000 +0 Carry in ====== 1111 carry out 0 Note that the carry bits are 0000 or: 1111 +0000 +1 Carry in ====== 0000 carry out 1 Note that the carry bits are 1111 With me so far? To describe an unstable case, I'm going to show only the carry bits. The inputs are the same as above. I'm assuming ripple carry, and that one time step is required for propagation of carry for one bit position. Time 0 Carry 0011 1 Carry 0110 2 Carry 1100 3 Carry 1001 4 Carry 0011 5 Carry 0110 6 Carry 1100 7 Carry 1001 8 Carry 0011 ... The sum would also not be stable. There are multiple ways to avoid the unstable cases. A few that come to mind with little effort: 1) Force the previous carry until the carry output is stable. This has the side effect of making the result (positive or negative zero) depend on the previous computation. (I think that the CDC6600 used this method) 2) If the carry look ahead shows all propagate bits are '1', then generate a carry in = '1' regardless of the carry input or any generate bits. This has the side effect of producing only positive zeros any add unless both operands were negative zeros. 3) Force a carry input of '0' until the carry output is stable. This has the side effect of never producing a positive zero output for any add unless both operands were positive zeros. (Hand computed examples often use this method) 4) Force a carry input of '1' until the carry output is stable. This has the side effect of producing only positive zeros any add unless both operands were negative zeros. -- Phil Hays (Xilinx, but posting for myself)
Phil Hays wrote about a one's complement adder, which uses end-around carry:> It looks to me like there is a chance of a pulse running around the > carry loop for a multiple times, perhaps even forever.I wrote:> There is no way that can happen if the inputs are stable. For an n-bit > one's complement adder, there can't be a carry propogation wider than n > bits. <Snip> please give an example of operands for which it could > occur.Phil Hays wrote:> Let me give an example. Note that the two inputs can be anything that are > inverses of each other. The first two cases are stable. > > 1111 > +0000 > +0 Carry in > ====== > 1111 carry out 0 > > Note that the carry bits are 0000A one's complement adder does not have a "carry-in" input, because there isn't s a simple one's complement "full adder" cell with three inputs. A one's complement adder consists of a chain of binary full adders with the carry-out from the MSB routed to the carry-in of the LSB; no individual piece of that structure can stand on its own as a one's complement adder. In particular, there is no simple way to daisy-chain two one's complement adders into a longer one's complement adder. If you need to add two n-bit one's complement numbers and an unsigned one-bit carry-in, you need a three-input n-bit one's complement adder, which is composed of *two* n-bit two-input one's complement adders: n --------- A ---/---> | 1's | n --------- | comp | ---/------> | 1's | n B ---/---> | adder | | comp | ----/---> result n --------- ----> | adder | | --------- n-1 | zeros -----/-------------------| | carry_in -----/-------------------- 1 The internal structure of each 1's complement adder as shown is an n-bit binary full adder with end-around carry. The logic of the second adder can be simplified slightly due to n-1 of its inputs being hardwired to zero, but it still has to be the full n bits wide. If you don't build it this way, you don't get the correct result. If you do build it this way, there isn't any way for the circuit to be unstable (oscillate), though it is of course possible to get an arithmetic overflow. Eric





