If you mean 4 through 7 in a literal sense, that is scaling by those four 
numbers, 4 requires 2 shifts, 5 2 shifts and an add, 6 2 shifts and an add 
and seven needs 3 shifts and a subtract.

Hul

Tim Wescott <seemywebsite@myfooter.really> wrote:
> On Thu, 01 Sep 2016 18:40:25 -0400, rickman wrote:

> > On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen wrote:
> >>
> >>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim
> >>> Wescott:
> >>>> There's a method that I know, but can't remember the name.  And now I
> >>>> want to tell someone to Google for it.
> >>>>
> >>>> It basically starts with the notion of multiplying by shift-and-add,
> >>>> but uses the fact that if you shift and then either add or subtract,
> >>>> you can minimize "addition" operations.
> >>>>
> >>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
> >>>>
> >>>>
> >>> Booth?
> >>>
> >>> -Lasse
> >>
> >> That's it.  Thanks.
> > 
> > That is very familiar from college, but I don't recall the utility.  It
> > would be useful for multiplying by constants, but otherwise how would
> > this be used to advantage?  It would save add/subtract operations, but I
> > can't think of another situation where this would be useful.
> > 
> > If the algorithm is doing an add and shift, the add does not increase
> > the time or the hardware.  If building the full multiplier, an adder is
> > included for each stage, it is either used or not used.  When done in
> > software, the same applies.  It is easier to do the add than to skip
> > over it.

> I asked here and on comp.arch.embedded.  It's for a guy who's doing 
> assembly-language programming on a PIC12xxx -- for that guy, and for a 
> small range of constants (4 through 7), it can save time over a full-
> blown multiplication algorithm.

> -- 

> Tim Wescott
> Wescott Design Services
> http://www.wescottdesign.com

> I'm looking for work -- see my website!

On 9/3/2016 7:44 PM, lasselangwadtchristensen@gmail.com wrote:
> Den s&oslash;ndag den 4. september 2016 kl. 00.10.56 UTC+2 skrev rickman:
>> On 9/3/2016 8:46 AM, lasselangwadtchristensen@gmail.com wrote:
>>> Den l&oslash;rdag den 3. september 2016 kl. 03.45.13 UTC+2 skrev rickman:
>>>> On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>> Den l&oslash;rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
>>>>>> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>>>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
>>>>>>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>>>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
>>>>>>>>> rickman:
>>>>>>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>>>>>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
>>>>>>>>>>>> Tim Wescott:
>>>>>>>>>>>>> There's a method that I know, but can't remember the
>>>>>>>>>>>>> name. And now I want to tell someone to Google for it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It basically starts with the notion of multiplying by
>>>>>>>>>>>>> shift-and-add, but uses the fact that if you shift and
>>>>>>>>>>>>> then either add or subtract, you can minimize "addition"
>>>>>>>>>>>>> operations.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Booth?
>>>>>>>>>>>>
>>>>>>>>>>>> -Lasse
>>>>>>>>>>>
>>>>>>>>>>> That's it.  Thanks.
>>>>>>>>>>
>>>>>>>>>> That is very familiar from college, but I don't recall the
>>>>>>>>>> utility. It would be useful for multiplying by constants, but
>>>>>>>>>> otherwise how would this be used to advantage?  It would save
>>>>>>>>>> add/subtract operations, but I can't think of another situation
>>>>>>>>>> where this would be useful.
>>>>>>>>>>
>>>>>>>>>> If the algorithm is doing an add and shift, the add does not
>>>>>>>>>> increase the time or the hardware.  If building the full
>>>>>>>>>> multiplier, an adder is included for each stage, it is either
>>>>>>>>>> used or not used.  When done in software, the same applies.  It
>>>>>>>>>> is easier to do the add than to skip over it.
>>>>>>>>>
>>>>>>>>> you only need half the stages so it is half the size* and the
>>>>>>>>> critical path through your adders are only half as long
>>>>>>>>
>>>>>>>> I think you need to look at the algorithm again.  The degenerate
>>>>>>>> case is a multiplier with alternating ones and zeros.  An add or
>>>>>>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
>>>>>>>> bit is a transition you still need an adder/subtractor for every
>>>>>>>> bit.
>>>>>>>>
>>>>>>>> Of course you could add logic to detect these cases and do fewer
>>>>>>>> adder ops, but then that is not Booth's algorithm anymore and is
>>>>>>>> much more complex.  Booth's algorithm looks at pairs of bits in the
>>>>>>>> multiplier, this would require looking at more bits.
>>>>>>>>
>>>>>>>
>>>>>>> you are right, I was thinking of "modified Booth" it looks at 3 bits
>>>>>>> at a time,
>>>>>>>
>>>>>>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
>>>>>>>
>>>>>>>
>>>>>>>> If you are thinking in terms of constant multiplication then again,
>>>>>>>> this is a modified method that combines Booth's with straight
>>>>>>>> adds.
>>>>>>>>
>>>>>>>>
>>>>>>>>> * you need a few multiplexors to choose between x1 and x2,
>>>>>>>>> subtract is invert and carry in
>>>>>>>>
>>>>>>>> Multiplexers are not low cost in any sense in many technologies,
>>>>>>>> but it doesn't matter.  Booth's algorithm doesn't use
>>>>>>>> multiplexers.
>>>>>>>
>>>>>>> may not be low cost, but compared to a full adder?
>>>>>>
>>>>>> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
>>>>>> Because there is extra carry logic in the LC one bit of an adder is the
>>>>>> same logic as one bit of a multiplexer.  The only disadvantage of an
>>>>>> adder is the carry propagation time, but they don't cascade, properly
>>>>>> designed you end up with one ripple cascade through one adder and then
>>>>>> the other logic delays as the adders all cascade in parallel.
>>>>>
>>>>> sure most fpgas have fast carry chains or build in multipliers so hand
>>>>> coding "fancy" multiplier structures might not come out ahead
>>>>>
>>>>>>
>>>>>>
>>>>>>> and since the inputs come from the multiplicand and the multiplier
>>>>>>> not from other intermediate results it shouldn't be in the critical
>>>>>>> path
>>>>>>
>>>>>> ???
>>>>>>
>>>>>> I don't see how you use multiplexers instead of adders.  If the
>>>>>> multiplier changes from one calculation to the next you need adders in
>>>>>> every position.  If the multiplier is fixed the combinations of sums is
>>>>>> fixed and no multiplexers are needed.
>>>>>
>>>>> with a regular multiplier you have to go through N layers of adders
>>>>> with a modified Booth you only have to go through N/2 adders
>>>>
>>>> and N/2 multiplexers which have the same delay... PLUS the selectable
>>>> inverter which may or may not be combined with the adder.  What's the
>>>> point?
>>>
>>> the multiplexers get their input from the multiplicant not the output
>>> of the previous adder
>>
>> Look at your diagram again.  The multiplexer is muxing the output of the
>> previous stage or the output of the stage prior to that if the previous
>> stage is skipped.  The multiplicand is the other input to the adder and
>> the control signal is derived from the multiplier which means there is
>> an added level of delay into the first stage with then has to ripple
>> through the entire structure.  The multiplexers *must* be in the path of
>> the additions.
>
> the multiplexers can be moved to multiplicand, adding zero is the same as skipping
>
> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif

This diagram is not totally clear, but it does not appear to be 
implementable in an FPGA using the carry chain.  Without carry chain the 
adds get much slower and/or use twice as much space.  The decoders and 
multiplexers do result in a larger design in most FPGAs.

It might be possible to use the carry chain across the adds.  This would 
be difficult to code in a way that is implemented with carry chains. 
Without the carry chains, the full adder bits require running through an 
extra LUT both for the input and the output in the devices I have looked 
at.

If this multiplier were to be pipelined, it would be exceedingly large 
using 2N FFs for every stage.  In that case the multiplier would also 
have to be pipelined resulting in 3N FFs per stage.


>>>> A simple adder has N stages of delay in an FPGA, same as the
>>>> much more complicated modified Booth's adder.  In an ASIC there may be
>>>> some advantage.  In software, I expect the much more complicated control
>>>> will make the modified Booth's algorithm the slowest of the three.
>>>
>>> agreed
>>>
>>>> People so often forget that multiplexers are not trivial logic.
>>>
>>> I think the main purpose of modified booth is speed not size
>>
>> I'm not sure how much faster a multiplexer is compared to the adder.
>> Even if you bypass a bunch of adders, you pass through an equivalent
>> number of muxes.
>
> I'm still talking modified Booth, it halfs the number of layers

Regardless, this is a fairly pointless exercise for FPGAs since most of 
them offer hard multipliers which run much faster than multipliers in 
the fabric typically.  I think this would only be useful in an ASIC.

-- 

Rick C

On 2016-09-01 4:24 PM, Tim Wescott wrote:
> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen wrote:
>
>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim
>> Wescott:
>>> There's a method that I know, but can't remember the name.  And
>>> now I want to tell someone to Google for it.
>>>
>>> It basically starts with the notion of multiplying by
>>> shift-and-add, but uses the fact that if you shift and then
>>> either add or subtract, you can minimize "addition" operations.
>>>
>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>
>>>
>> Booth?
>>
>> -Lasse
>
> That's it.  Thanks.
>

Bit late but it sounds like Horner decomposition. Yep that guy..

I couldn't find a quick reference on line.  The notes in our compilers
for inline constant multiplies is implemented with Horners polynomial
decomposition. This requires no temporary space and only uses shifts and
adds.

I remember writing the code on a weekend on a trip where I was getting
caught up on a bunch of potentially interesting papers and it was
pouring rain. I decided this would be just as much fun anyway.

Old enough the paper is in a banker box close enough to the center that
it hasn't become nesting material for the field mice.

w..

Den s=C3=B8ndag den 4. september 2016 kl. 00.10.56 UTC+2 skrev rickman:
> On 9/3/2016 8:46 AM, lasselangwadtchristensen@gmail.com wrote:
> > Den l=C3=B8rdag den 3. september 2016 kl. 03.45.13 UTC+2 skrev rickman:
> >> On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
> >>> Den l=C3=B8rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickma=
n:
> >>>> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
> >>>>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
> >>>>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> >>>>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
> >>>>>>> rickman:
> >>>>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >>>>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
> >>>>>>>>>> Tim Wescott:
> >>>>>>>>>>> There's a method that I know, but can't remember the
> >>>>>>>>>>> name. And now I want to tell someone to Google for it.
> >>>>>>>>>>>
> >>>>>>>>>>> It basically starts with the notion of multiplying by
> >>>>>>>>>>> shift-and-add, but uses the fact that if you shift and
> >>>>>>>>>>> then either add or subtract, you can minimize "addition"
> >>>>>>>>>>> operations.
> >>>>>>>>>>>
> >>>>>>>>>>> I.e., 255 =3D 256 - 1, 244 =3D 256 - 16 + 4, etc.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> Booth?
> >>>>>>>>>>
> >>>>>>>>>> -Lasse
> >>>>>>>>>
> >>>>>>>>> That's it.  Thanks.
> >>>>>>>>
> >>>>>>>> That is very familiar from college, but I don't recall the
> >>>>>>>> utility. It would be useful for multiplying by constants, but
> >>>>>>>> otherwise how would this be used to advantage?  It would save
> >>>>>>>> add/subtract operations, but I can't think of another situation
> >>>>>>>> where this would be useful.
> >>>>>>>>
> >>>>>>>> If the algorithm is doing an add and shift, the add does not
> >>>>>>>> increase the time or the hardware.  If building the full
> >>>>>>>> multiplier, an adder is included for each stage, it is either
> >>>>>>>> used or not used.  When done in software, the same applies.  It
> >>>>>>>> is easier to do the add than to skip over it.
> >>>>>>>
> >>>>>>> you only need half the stages so it is half the size* and the
> >>>>>>> critical path through your adders are only half as long
> >>>>>>
> >>>>>> I think you need to look at the algorithm again.  The degenerate
> >>>>>> case is a multiplier with alternating ones and zeros.  An add or
> >>>>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
> >>>>>> bit is a transition you still need an adder/subtractor for every
> >>>>>> bit.
> >>>>>>
> >>>>>> Of course you could add logic to detect these cases and do fewer
> >>>>>> adder ops, but then that is not Booth's algorithm anymore and is
> >>>>>> much more complex.  Booth's algorithm looks at pairs of bits in th=
e
> >>>>>> multiplier, this would require looking at more bits.
> >>>>>>
> >>>>>
> >>>>> you are right, I was thinking of "modified Booth" it looks at 3 bit=
s
> >>>>> at a time,
> >>>>>
> >>>>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
> >>>>>
> >>>>>
> >>>>>> If you are thinking in terms of constant multiplication then again=
,
> >>>>>> this is a modified method that combines Booth's with straight
> >>>>>> adds.
> >>>>>>
> >>>>>>
> >>>>>>> * you need a few multiplexors to choose between x1 and x2,
> >>>>>>> subtract is invert and carry in
> >>>>>>
> >>>>>> Multiplexers are not low cost in any sense in many technologies,
> >>>>>> but it doesn't matter.  Booth's algorithm doesn't use
> >>>>>> multiplexers.
> >>>>>
> >>>>> may not be low cost, but compared to a full adder?
> >>>>
> >>>> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
> >>>> Because there is extra carry logic in the LC one bit of an adder is =
the
> >>>> same logic as one bit of a multiplexer.  The only disadvantage of an
> >>>> adder is the carry propagation time, but they don't cascade, properl=
y
> >>>> designed you end up with one ripple cascade through one adder and th=
en
> >>>> the other logic delays as the adders all cascade in parallel.
> >>>
> >>> sure most fpgas have fast carry chains or build in multipliers so han=
d
> >>> coding "fancy" multiplier structures might not come out ahead
> >>>
> >>>>
> >>>>
> >>>>> and since the inputs come from the multiplicand and the multiplier
> >>>>> not from other intermediate results it shouldn't be in the critical
> >>>>> path
> >>>>
> >>>> ???
> >>>>
> >>>> I don't see how you use multiplexers instead of adders.  If the
> >>>> multiplier changes from one calculation to the next you need adders =
in
> >>>> every position.  If the multiplier is fixed the combinations of sums=
 is
> >>>> fixed and no multiplexers are needed.
> >>>
> >>> with a regular multiplier you have to go through N layers of adders
> >>> with a modified Booth you only have to go through N/2 adders
> >>
> >> and N/2 multiplexers which have the same delay... PLUS the selectable
> >> inverter which may or may not be combined with the adder.  What's the
> >> point?
> >
> > the multiplexers get their input from the multiplicant not the output
> > of the previous adder
>=20
> Look at your diagram again.  The multiplexer is muxing the output of the=
=20
> previous stage or the output of the stage prior to that if the previous=
=20
> stage is skipped.  The multiplicand is the other input to the adder and=
=20
> the control signal is derived from the multiplier which means there is=20
> an added level of delay into the first stage with then has to ripple=20
> through the entire structure.  The multiplexers *must* be in the path of=
=20
> the additions.

the multiplexers can be moved to multiplicand, adding zero is the same as s=
kipping

http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif

>=20
> >> A simple adder has N stages of delay in an FPGA, same as the
> >> much more complicated modified Booth's adder.  In an ASIC there may be
> >> some advantage.  In software, I expect the much more complicated contr=
ol
> >> will make the modified Booth's algorithm the slowest of the three.
> >
> > agreed
> >
> >> People so often forget that multiplexers are not trivial logic.
> >
> > I think the main purpose of modified booth is speed not size
>=20
> I'm not sure how much faster a multiplexer is compared to the adder.=20
> Even if you bypass a bunch of adders, you pass through an equivalent=20
> number of muxes. =20

I'm still talking modified Booth, it halfs the number of layers
=20
-Lasse

On 9/3/2016 8:46 AM, lasselangwadtchristensen@gmail.com wrote:
> Den l&oslash;rdag den 3. september 2016 kl. 03.45.13 UTC+2 skrev rickman:
>> On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
>>> Den l&oslash;rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
>>>> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
>>>>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
>>>>>>> rickman:
>>>>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>>>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
>>>>>>>>>> Tim Wescott:
>>>>>>>>>>> There's a method that I know, but can't remember the
>>>>>>>>>>> name. And now I want to tell someone to Google for it.
>>>>>>>>>>>
>>>>>>>>>>> It basically starts with the notion of multiplying by
>>>>>>>>>>> shift-and-add, but uses the fact that if you shift and
>>>>>>>>>>> then either add or subtract, you can minimize "addition"
>>>>>>>>>>> operations.
>>>>>>>>>>>
>>>>>>>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Booth?
>>>>>>>>>>
>>>>>>>>>> -Lasse
>>>>>>>>>
>>>>>>>>> That's it.  Thanks.
>>>>>>>>
>>>>>>>> That is very familiar from college, but I don't recall the
>>>>>>>> utility. It would be useful for multiplying by constants, but
>>>>>>>> otherwise how would this be used to advantage?  It would save
>>>>>>>> add/subtract operations, but I can't think of another situation
>>>>>>>> where this would be useful.
>>>>>>>>
>>>>>>>> If the algorithm is doing an add and shift, the add does not
>>>>>>>> increase the time or the hardware.  If building the full
>>>>>>>> multiplier, an adder is included for each stage, it is either
>>>>>>>> used or not used.  When done in software, the same applies.  It
>>>>>>>> is easier to do the add than to skip over it.
>>>>>>>
>>>>>>> you only need half the stages so it is half the size* and the
>>>>>>> critical path through your adders are only half as long
>>>>>>
>>>>>> I think you need to look at the algorithm again.  The degenerate
>>>>>> case is a multiplier with alternating ones and zeros.  An add or
>>>>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
>>>>>> bit is a transition you still need an adder/subtractor for every
>>>>>> bit.
>>>>>>
>>>>>> Of course you could add logic to detect these cases and do fewer
>>>>>> adder ops, but then that is not Booth's algorithm anymore and is
>>>>>> much more complex.  Booth's algorithm looks at pairs of bits in the
>>>>>> multiplier, this would require looking at more bits.
>>>>>>
>>>>>
>>>>> you are right, I was thinking of "modified Booth" it looks at 3 bits
>>>>> at a time,
>>>>>
>>>>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
>>>>>
>>>>>
>>>>>> If you are thinking in terms of constant multiplication then again,
>>>>>> this is a modified method that combines Booth's with straight
>>>>>> adds.
>>>>>>
>>>>>>
>>>>>>> * you need a few multiplexors to choose between x1 and x2,
>>>>>>> subtract is invert and carry in
>>>>>>
>>>>>> Multiplexers are not low cost in any sense in many technologies,
>>>>>> but it doesn't matter.  Booth's algorithm doesn't use
>>>>>> multiplexers.
>>>>>
>>>>> may not be low cost, but compared to a full adder?
>>>>
>>>> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
>>>> Because there is extra carry logic in the LC one bit of an adder is the
>>>> same logic as one bit of a multiplexer.  The only disadvantage of an
>>>> adder is the carry propagation time, but they don't cascade, properly
>>>> designed you end up with one ripple cascade through one adder and then
>>>> the other logic delays as the adders all cascade in parallel.
>>>
>>> sure most fpgas have fast carry chains or build in multipliers so hand
>>> coding "fancy" multiplier structures might not come out ahead
>>>
>>>>
>>>>
>>>>> and since the inputs come from the multiplicand and the multiplier
>>>>> not from other intermediate results it shouldn't be in the critical
>>>>> path
>>>>
>>>> ???
>>>>
>>>> I don't see how you use multiplexers instead of adders.  If the
>>>> multiplier changes from one calculation to the next you need adders in
>>>> every position.  If the multiplier is fixed the combinations of sums is
>>>> fixed and no multiplexers are needed.
>>>
>>> with a regular multiplier you have to go through N layers of adders
>>> with a modified Booth you only have to go through N/2 adders
>>
>> and N/2 multiplexers which have the same delay... PLUS the selectable
>> inverter which may or may not be combined with the adder.  What's the
>> point?
>
> the multiplexers get their input from the multiplicant not the output
> of the previous adder

Look at your diagram again.  The multiplexer is muxing the output of the 
previous stage or the output of the stage prior to that if the previous 
stage is skipped.  The multiplicand is the other input to the adder and 
the control signal is derived from the multiplier which means there is 
an added level of delay into the first stage with then has to ripple 
through the entire structure.  The multiplexers *must* be in the path of 
the additions.



  Multiplicand                                 Multiplier
       |                                            |
       +--------------------+                       |
       |                    |                       |
       |                +-------+                   |
       |                |       |     Control       |
       +----------+     | +/-/0 |-------------------+
       |          |     |       |                   |
       |          |     +-------+                   |
       |          |         |                       |
       |          |         +------+                |
       |          |         |      |                |
       |      +-------+ +-------+  |   +--------+   |
       |       \       V       /   |   |        |   |
       |        \         +/- /----)---| Decode |---+
       |         \    ADD    /     |   |        |   |
       |          \_________/      |   +--------+   |
       |               |           |                |
       |               |         +-+                |
       |               |         |                  |
       |            +---------------+  +--------+   |
       |             \             /   |        |   |
       +----------+   \    MUX    /----| Decode |---+
       |          |    \         /     |        |   |
       |          |     +-------+      +--------+   |
       |          |         |                       |
       |          |         +------+                |
       |          |         |      |                |
       |      +-------+ +-------+  |   +--------+   |
       |       \       V       /   |   |        |   |
       |        \         +/- /----)---| Decode |---+
       |         \    ADD    /     |   |        |   |
       |          \_________/      |   +--------+   |
       |               |           |                |
       |               |         +-+                |
       |               |         |                  |
       |            +---------------+  +--------+   |
       |             \             /   |        |   |
       +----------+   \    MUX    /----| Decode |---+
       |          |    \         /     |        |   |
       |          |     +-------+      +--------+   |
       |          |         |                       |



>> A simple adder has N stages of delay in an FPGA, same as the
>> much more complicated modified Booth's adder.  In an ASIC there may be
>> some advantage.  In software, I expect the much more complicated control
>> will make the modified Booth's algorithm the slowest of the three.
>
> agreed
>
>> People so often forget that multiplexers are not trivial logic.
>
> I think the main purpose of modified booth is speed not size

I'm not sure how much faster a multiplexer is compared to the adder. 
Even if you bypass a bunch of adders, you pass through an equivalent 
number of muxes.  In an FPGA this is equivalent in speed.  In an ASIC 
you may see some speed advantage.

-- 

Rick C

Den l&oslash;rdag den 3. september 2016 kl. 03.45.13 UTC+2 skrev rickman:
> On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
> > Den l&oslash;rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
> >> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
> >>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
> >>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> >>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
> >>>>> rickman:
> >>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
> >>>>>>>> Tim Wescott:
> >>>>>>>>> There's a method that I know, but can't remember the
> >>>>>>>>> name. And now I want to tell someone to Google for it.
> >>>>>>>>>
> >>>>>>>>> It basically starts with the notion of multiplying by
> >>>>>>>>> shift-and-add, but uses the fact that if you shift and
> >>>>>>>>> then either add or subtract, you can minimize "addition"
> >>>>>>>>> operations.
> >>>>>>>>>
> >>>>>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Booth?
> >>>>>>>>
> >>>>>>>> -Lasse
> >>>>>>>
> >>>>>>> That's it.  Thanks.
> >>>>>>
> >>>>>> That is very familiar from college, but I don't recall the
> >>>>>> utility. It would be useful for multiplying by constants, but
> >>>>>> otherwise how would this be used to advantage?  It would save
> >>>>>> add/subtract operations, but I can't think of another situation
> >>>>>> where this would be useful.
> >>>>>>
> >>>>>> If the algorithm is doing an add and shift, the add does not
> >>>>>> increase the time or the hardware.  If building the full
> >>>>>> multiplier, an adder is included for each stage, it is either
> >>>>>> used or not used.  When done in software, the same applies.  It
> >>>>>> is easier to do the add than to skip over it.
> >>>>>
> >>>>> you only need half the stages so it is half the size* and the
> >>>>> critical path through your adders are only half as long
> >>>>
> >>>> I think you need to look at the algorithm again.  The degenerate
> >>>> case is a multiplier with alternating ones and zeros.  An add or
> >>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
> >>>> bit is a transition you still need an adder/subtractor for every
> >>>> bit.
> >>>>
> >>>> Of course you could add logic to detect these cases and do fewer
> >>>> adder ops, but then that is not Booth's algorithm anymore and is
> >>>> much more complex.  Booth's algorithm looks at pairs of bits in the
> >>>> multiplier, this would require looking at more bits.
> >>>>
> >>>
> >>> you are right, I was thinking of "modified Booth" it looks at 3 bits
> >>> at a time,
> >>>
> >>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
> >>>
> >>>
> >>>> If you are thinking in terms of constant multiplication then again,
> >>>> this is a modified method that combines Booth's with straight
> >>>> adds.
> >>>>
> >>>>
> >>>>> * you need a few multiplexors to choose between x1 and x2,
> >>>>> subtract is invert and carry in
> >>>>
> >>>> Multiplexers are not low cost in any sense in many technologies,
> >>>> but it doesn't matter.  Booth's algorithm doesn't use
> >>>> multiplexers.
> >>>
> >>> may not be low cost, but compared to a full adder?
> >>
> >> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
> >> Because there is extra carry logic in the LC one bit of an adder is the
> >> same logic as one bit of a multiplexer.  The only disadvantage of an
> >> adder is the carry propagation time, but they don't cascade, properly
> >> designed you end up with one ripple cascade through one adder and then
> >> the other logic delays as the adders all cascade in parallel.
> >
> > sure most fpgas have fast carry chains or build in multipliers so hand
> > coding "fancy" multiplier structures might not come out ahead
> >
> >>
> >>
> >>> and since the inputs come from the multiplicand and the multiplier
> >>> not from other intermediate results it shouldn't be in the critical
> >>> path
> >>
> >> ???
> >>
> >> I don't see how you use multiplexers instead of adders.  If the
> >> multiplier changes from one calculation to the next you need adders in
> >> every position.  If the multiplier is fixed the combinations of sums is
> >> fixed and no multiplexers are needed.
> >
> > with a regular multiplier you have to go through N layers of adders
> > with a modified Booth you only have to go through N/2 adders
> 
> and N/2 multiplexers which have the same delay... PLUS the selectable 
> inverter which may or may not be combined with the adder.  What's the 
> point?  

the multiplexers get their input from the multiplicant not the output 
of the previous adder

> A simple adder has N stages of delay in an FPGA, same as the 
> much more complicated modified Booth's adder.  In an ASIC there may be 
> some advantage.  In software, I expect the much more complicated control 
> will make the modified Booth's algorithm the slowest of the three.

agreed 
 
> People so often forget that multiplexers are not trivial logic.

I think the main purpose of modified booth is speed not size 

-Lasse

On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
> Den l&oslash;rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
>> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
>>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
>>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
>>>>> rickman:
>>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
>>>>>>>> Tim Wescott:
>>>>>>>>> There's a method that I know, but can't remember the
>>>>>>>>> name. And now I want to tell someone to Google for it.
>>>>>>>>>
>>>>>>>>> It basically starts with the notion of multiplying by
>>>>>>>>> shift-and-add, but uses the fact that if you shift and
>>>>>>>>> then either add or subtract, you can minimize "addition"
>>>>>>>>> operations.
>>>>>>>>>
>>>>>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Booth?
>>>>>>>>
>>>>>>>> -Lasse
>>>>>>>
>>>>>>> That's it.  Thanks.
>>>>>>
>>>>>> That is very familiar from college, but I don't recall the
>>>>>> utility. It would be useful for multiplying by constants, but
>>>>>> otherwise how would this be used to advantage?  It would save
>>>>>> add/subtract operations, but I can't think of another situation
>>>>>> where this would be useful.
>>>>>>
>>>>>> If the algorithm is doing an add and shift, the add does not
>>>>>> increase the time or the hardware.  If building the full
>>>>>> multiplier, an adder is included for each stage, it is either
>>>>>> used or not used.  When done in software, the same applies.  It
>>>>>> is easier to do the add than to skip over it.
>>>>>
>>>>> you only need half the stages so it is half the size* and the
>>>>> critical path through your adders are only half as long
>>>>
>>>> I think you need to look at the algorithm again.  The degenerate
>>>> case is a multiplier with alternating ones and zeros.  An add or
>>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
>>>> bit is a transition you still need an adder/subtractor for every
>>>> bit.
>>>>
>>>> Of course you could add logic to detect these cases and do fewer
>>>> adder ops, but then that is not Booth's algorithm anymore and is
>>>> much more complex.  Booth's algorithm looks at pairs of bits in the
>>>> multiplier, this would require looking at more bits.
>>>>
>>>
>>> you are right, I was thinking of "modified Booth" it looks at 3 bits
>>> at a time,
>>>
>>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
>>>
>>>
>>>> If you are thinking in terms of constant multiplication then again,
>>>> this is a modified method that combines Booth's with straight
>>>> adds.
>>>>
>>>>
>>>>> * you need a few multiplexors to choose between x1 and x2,
>>>>> subtract is invert and carry in
>>>>
>>>> Multiplexers are not low cost in any sense in many technologies,
>>>> but it doesn't matter.  Booth's algorithm doesn't use
>>>> multiplexers.
>>>
>>> may not be low cost, but compared to a full adder?
>>
>> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
>> Because there is extra carry logic in the LC one bit of an adder is the
>> same logic as one bit of a multiplexer.  The only disadvantage of an
>> adder is the carry propagation time, but they don't cascade, properly
>> designed you end up with one ripple cascade through one adder and then
>> the other logic delays as the adders all cascade in parallel.
>
> sure most fpgas have fast carry chains or build in multipliers so hand
> coding "fancy" multiplier structures might not come out ahead
>
>>
>>
>>> and since the inputs come from the multiplicand and the multiplier
>>> not from other intermediate results it shouldn't be in the critical
>>> path
>>
>> ???
>>
>> I don't see how you use multiplexers instead of adders.  If the
>> multiplier changes from one calculation to the next you need adders in
>> every position.  If the multiplier is fixed the combinations of sums is
>> fixed and no multiplexers are needed.
>
> with a regular multiplier you have to go through N layers of adders
> with a modified Booth you only have to go through N/2 adders

and N/2 multiplexers which have the same delay... PLUS the selectable 
inverter which may or may not be combined with the adder.  What's the 
point?  A simple adder has N stages of delay in an FPGA, same as the 
much more complicated modified Booth's adder.  In an ASIC there may be 
some advantage.  In software, I expect the much more complicated control 
will make the modified Booth's algorithm the slowest of the three.

People so often forget that multiplexers are not trivial logic.

-- 

Rick C

Den l=C3=B8rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
> > Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
> >> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> >>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
> >>> rickman:
> >>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
> >>>>> wrote:
> >>>>>
> >>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
> >>>>>> Tim Wescott:
> >>>>>>> There's a method that I know, but can't remember the
> >>>>>>> name. And now I want to tell someone to Google for it.
> >>>>>>>
> >>>>>>> It basically starts with the notion of multiplying by
> >>>>>>> shift-and-add, but uses the fact that if you shift and
> >>>>>>> then either add or subtract, you can minimize "addition"
> >>>>>>> operations.
> >>>>>>>
> >>>>>>> I.e., 255 =3D 256 - 1, 244 =3D 256 - 16 + 4, etc.
> >>>>>>>
> >>>>>>>
> >>>>>> Booth?
> >>>>>>
> >>>>>> -Lasse
> >>>>>
> >>>>> That's it.  Thanks.
> >>>>
> >>>> That is very familiar from college, but I don't recall the
> >>>> utility. It would be useful for multiplying by constants, but
> >>>> otherwise how would this be used to advantage?  It would save
> >>>> add/subtract operations, but I can't think of another situation
> >>>> where this would be useful.
> >>>>
> >>>> If the algorithm is doing an add and shift, the add does not
> >>>> increase the time or the hardware.  If building the full
> >>>> multiplier, an adder is included for each stage, it is either
> >>>> used or not used.  When done in software, the same applies.  It
> >>>> is easier to do the add than to skip over it.
> >>>
> >>> you only need half the stages so it is half the size* and the
> >>> critical path through your adders are only half as long
> >>
> >> I think you need to look at the algorithm again.  The degenerate
> >> case is a multiplier with alternating ones and zeros.  An add or
> >> subtract is needed at each 1->0 or 0->1 transition.  Since every
> >> bit is a transition you still need an adder/subtractor for every
> >> bit.
> >>
> >> Of course you could add logic to detect these cases and do fewer
> >> adder ops, but then that is not Booth's algorithm anymore and is
> >> much more complex.  Booth's algorithm looks at pairs of bits in the
> >> multiplier, this would require looking at more bits.
> >>
> >
> > you are right, I was thinking of "modified Booth" it looks at 3 bits
> > at a time,
> >
> > http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
> >
> >
> >> If you are thinking in terms of constant multiplication then again,
> >> this is a modified method that combines Booth's with straight
> >> adds.
> >>
> >>
> >>> * you need a few multiplexors to choose between x1 and x2,
> >>> subtract is invert and carry in
> >>
> >> Multiplexers are not low cost in any sense in many technologies,
> >> but it doesn't matter.  Booth's algorithm doesn't use
> >> multiplexers.
> >
> > may not be low cost, but compared to a full adder?
>=20
> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.=20
> Because there is extra carry logic in the LC one bit of an adder is the=
=20
> same logic as one bit of a multiplexer.  The only disadvantage of an=20
> adder is the carry propagation time, but they don't cascade, properly=20
> designed you end up with one ripple cascade through one adder and then=20
> the other logic delays as the adders all cascade in parallel.

sure most fpgas have fast carry chains or build in multipliers so hand=20
coding "fancy" multiplier structures might not come out ahead
=20
>=20
>=20
> > and since the inputs come from the multiplicand and the multiplier
> > not from other intermediate results it shouldn't be in the critical
> > path
>=20
> ???
>=20
> I don't see how you use multiplexers instead of adders.  If the=20
> multiplier changes from one calculation to the next you need adders in=20
> every position.  If the multiplier is fixed the combinations of sums is=
=20
> fixed and no multiplexers are needed.

with a regular multiplier you have to go through N layers of adders=20
with a modified Booth you only have to go through N/2 adders=20

-Lasse

On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
>>> rickman:
>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
>>>>> wrote:
>>>>>
>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
>>>>>> Tim Wescott:
>>>>>>> There's a method that I know, but can't remember the
>>>>>>> name. And now I want to tell someone to Google for it.
>>>>>>>
>>>>>>> It basically starts with the notion of multiplying by
>>>>>>> shift-and-add, but uses the fact that if you shift and
>>>>>>> then either add or subtract, you can minimize "addition"
>>>>>>> operations.
>>>>>>>
>>>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>>>
>>>>>>>
>>>>>> Booth?
>>>>>>
>>>>>> -Lasse
>>>>>
>>>>> That's it.  Thanks.
>>>>
>>>> That is very familiar from college, but I don't recall the
>>>> utility. It would be useful for multiplying by constants, but
>>>> otherwise how would this be used to advantage?  It would save
>>>> add/subtract operations, but I can't think of another situation
>>>> where this would be useful.
>>>>
>>>> If the algorithm is doing an add and shift, the add does not
>>>> increase the time or the hardware.  If building the full
>>>> multiplier, an adder is included for each stage, it is either
>>>> used or not used.  When done in software, the same applies.  It
>>>> is easier to do the add than to skip over it.
>>>
>>> you only need half the stages so it is half the size* and the
>>> critical path through your adders are only half as long
>>
>> I think you need to look at the algorithm again.  The degenerate
>> case is a multiplier with alternating ones and zeros.  An add or
>> subtract is needed at each 1->0 or 0->1 transition.  Since every
>> bit is a transition you still need an adder/subtractor for every
>> bit.
>>
>> Of course you could add logic to detect these cases and do fewer
>> adder ops, but then that is not Booth's algorithm anymore and is
>> much more complex.  Booth's algorithm looks at pairs of bits in the
>> multiplier, this would require looking at more bits.
>>
>
> you are right, I was thinking of "modified Booth" it looks at 3 bits
> at a time,
>
> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
>
>
>> If you are thinking in terms of constant multiplication then again,
>> this is a modified method that combines Booth's with straight
>> adds.
>>
>>
>>> * you need a few multiplexors to choose between x1 and x2,
>>> subtract is invert and carry in
>>
>> Multiplexers are not low cost in any sense in many technologies,
>> but it doesn't matter.  Booth's algorithm doesn't use
>> multiplexers.
>
> may not be low cost, but compared to a full adder?

In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF. 
Because there is extra carry logic in the LC one bit of an adder is the 
same logic as one bit of a multiplexer.  The only disadvantage of an 
adder is the carry propagation time, but they don't cascade, properly 
designed you end up with one ripple cascade through one adder and then 
the other logic delays as the adders all cascade in parallel.


> and since the inputs come from the multiplicand and the multiplier
> not from other intermediate results it shouldn't be in the critical
> path

???

I don't see how you use multiplexers instead of adders.  If the 
multiplier changes from one calculation to the next you need adders in 
every position.  If the multiplier is fixed the combinations of sums is 
fixed and no multiplexers are needed.

-- 

Rick C

Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> > Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev rickman:
> >> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
> >>> wrote:
> >>>
> >>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim
> >>>> Wescott:
> >>>>> There's a method that I know, but can't remember the name.
> >>>>> And now I want to tell someone to Google for it.
> >>>>>
> >>>>> It basically starts with the notion of multiplying by
> >>>>> shift-and-add, but uses the fact that if you shift and then
> >>>>> either add or subtract, you can minimize "addition"
> >>>>> operations.
> >>>>>
> >>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
> >>>>>
> >>>>>
> >>>> Booth?
> >>>>
> >>>> -Lasse
> >>>
> >>> That's it.  Thanks.
> >>
> >> That is very familiar from college, but I don't recall the utility.
> >> It would be useful for multiplying by constants, but otherwise how
> >> would this be used to advantage?  It would save add/subtract
> >> operations, but I can't think of another situation where this would
> >> be useful.
> >>
> >> If the algorithm is doing an add and shift, the add does not
> >> increase the time or the hardware.  If building the full
> >> multiplier, an adder is included for each stage, it is either used
> >> or not used.  When done in software, the same applies.  It is
> >> easier to do the add than to skip over it.
> >
> > you only need half the stages so it is half the size* and the
> > critical path through your adders are only half as long
> 
> I think you need to look at the algorithm again.  The degenerate case is 
> a multiplier with alternating ones and zeros.  An add or subtract is 
> needed at each 1->0 or 0->1 transition.  Since every bit is a transition 
> you still need an adder/subtractor for every bit.
> 
> Of course you could add logic to detect these cases and do fewer adder 
> ops, but then that is not Booth's algorithm anymore and is much more 
> complex.  Booth's algorithm looks at pairs of bits in the multiplier, 
> this would require looking at more bits.
> 

you are right, I was thinking of "modified Booth" it looks at 3 bits at 
a time, 

http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif


> If you are thinking in terms of constant multiplication then again, this 
> is a modified method that combines Booth's with straight adds.
> 
> 
> > * you need a few multiplexors to choose between x1 and x2, subtract
> > is invert and carry in
> 
> Multiplexers are not low cost in any sense in many technologies, but it 
> doesn't matter.  Booth's algorithm doesn't use multiplexers.

may not be low cost, but compared to a full adder?

and since the inputs come from the multiplicand and the multiplier not from other intermediate results it shouldn't be in the critical path

-Lasse