# Using DSP Units

Started by November 22, 2020
```I working with the Gowin GW1N devices and need to do some serious math.  By=
serious, I mean a number of calculations, not that they have to be fast.  =
In fact, I pretty much have all the time in the world relatively speaking. =
The cycle time for performing all the calculations is 5 ms with a 33 MHz c=
lock, so 167,000 odd cycles. =20

What I'm not up to speed about is just how to use or even infer such logic.=
Certainly they can be instantiated which I might do.  But the docs are pr=
etty poor.  For each configuration, the user guide shows a set of equations=
it can implement, a block diagram with various control signals and data pa=
ths and then an interface prototype of I suppose the inferred object.  The =
equations are very easy to understand...=20
DOUT =3D A * B =C2=B1 C=20
DOUT =3D =E2=88=91(A * B)=20
DOUT =3D A * B + CASI

The full capability is more complex, but the copy and paste has too many th=
ings to fix up to bother with.  The point is they don't make it clear how t=
he controls work or even what can be controlled in real time vs. needing to=
be configured.  I guess I'll have to write some code and experiment with t=
he synthesis.  I can try writing support for some answers.  This is a perso=
n rather than a black hole at a web site, so I usually get an adequate answ=
er.=20

I just wondered how this is done with other brands of devices.=20

--=20

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209
```
```On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
> I working with the Gowin GW1N devices and need to do some serious math.  By serious, I mean a number of calculations, not that they have to be fast.  In fact, I pretty much have all the time in the world relatively speaking.  The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.
>
> What I'm not up to speed about is just how to use or even infer such logic.  Certainly they can be instantiated which I might do.  But the docs are pretty poor.  For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object.  The equations are very easy to understand...
> DOUT = A * B &plusmn; C
> DOUT = &sum;(A * B)
> DOUT = A * B + CASI
>
> The full capability is more complex, but the copy and paste has too many things to fix up to bother with.  The point is they don't make it clear how the controls work or even what can be controlled in real time vs. needing to be configured.  I guess I'll have to write some code and experiment with the synthesis.  I can try writing support for some answers.  This is a person rather than a black hole at a web site, so I usually get an adequate answer.
>
> I just wondered how this is done with other brands of devices.
>
By instantiation (in my case with Lattice or Altera), mainly because the
DSP is avery limited resource and I needed full control over how it was
shared. My one big Vivado project with a chip with much more resources I
think I let the tools to a bit of inferring but mostly used blocks out
of the Xilinx IP collection.

MK
```
```On Sunday, 11/22/2020 2:12 AM, Michael Kellett wrote:
> On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
>> I working with the Gowin GW1N devices and need to do some serious
>> math.&nbsp; By serious, I mean a number of calculations, not that they have
>> to be fast.&nbsp; In fact, I pretty much have all the time in the world
>> relatively speaking.&nbsp; The cycle time for performing all the
>> calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.
>>
>> What I'm not up to speed about is just how to use or even infer such
>> logic.&nbsp; Certainly they can be instantiated which I might do.&nbsp; But the
>> docs are pretty poor.&nbsp; For each configuration, the user guide shows a
>> set of equations it can implement, a block diagram with various
>> control signals and data paths and then an interface prototype of I
>> suppose the inferred object.&nbsp; The equations are very easy to
>> understand...
>> DOUT = A * B &plusmn; C
>> DOUT = &sum;(A * B)
>> DOUT = A * B + CASI
>>
>> The full capability is more complex, but the copy and paste has too
>> many things to fix up to bother with.&nbsp; The point is they don't make it
>> clear how the controls work or even what can be controlled in real
>> time vs. needing to be configured.&nbsp; I guess I'll have to write some
>> code and experiment with the synthesis.&nbsp; I can try writing support for
>> some answers.&nbsp; This is a person rather than a black hole at a web
>>
>> I just wondered how this is done with other brands of devices.
>>
> By instantiation (in my case with Lattice or Altera), mainly because the
> DSP is avery limited resource and I needed full control over how it was
> shared. My one big Vivado project with a chip with much more resources I
> think I let the tools to a bit of inferring but mostly used blocks out
> of the Xilinx IP collection.
>
> MK

In the Xilinx tools, it's generally easy to infer your logic, and let
the tools figure out how to place it in DSPs.  I used to try to place
pipeline registers where they were needed based on the DSP architecture,
but soon found out that if you just place a lot of pipeline stages at
the end, the tools will push the registers into the required places.  If
you were using 3rd party tools like Symplify Pro I would expect the same
behavior regardless of the target FPGA.  I'm not familiar with Gowin, so
I couldn't tell you what to expect from their tools.  In any case it's
easy enough to write code for inference and see what the tools do with it.

--
Gabor

--
Gabor
```
```On Sunday, November 22, 2020 at 2:13:03 AM UTC-5, Michael Kellett wrote:
> On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:=20
> > I working with the Gowin GW1N devices and need to do some serious math.=
By serious, I mean a number of calculations, not that they have to be fast=
. In fact, I pretty much have all the time in the world relatively speaking=
. The cycle time for performing all the calculations is 5 ms with a 33 MHz =
clock, so 167,000 odd cycles.=20
> >=20
> > What I'm not up to speed about is just how to use or even infer such lo=
gic. Certainly they can be instantiated which I might do. But the docs are =
pretty poor. For each configuration, the user guide shows a set of equation=
s it can implement, a block diagram with various control signals and data p=
aths and then an interface prototype of I suppose the inferred object. The =
equations are very easy to understand...=20
> > DOUT =3D A * B =C2=B1 C=20
> > DOUT =3D =E2=88=91(A * B)=20
> > DOUT =3D A * B + CASI=20
> >=20
> > The full capability is more complex, but the copy and paste has too man=
y things to fix up to bother with. The point is they don't make it clear ho=
w the controls work or even what can be controlled in real time vs. needing=
to be configured. I guess I'll have to write some code and experiment with=
the synthesis. I can try writing support for some answers. This is a perso=
n rather than a black hole at a web site, so I usually get an adequate answ=
er.=20
> >=20
> > I just wondered how this is done with other brands of devices.=20
> >
> By instantiation (in my case with Lattice or Altera), mainly because the=
=20
> DSP is avery limited resource and I needed full control over how it was=
=20
> shared. My one big Vivado project with a chip with much more resources I=
=20
> think I let the tools to a bit of inferring but mostly used blocks out=20
> of the Xilinx IP collection.=20

I'd be willing to use inference if I actually could understand just what is=
in the DSP blocks.  The adder can do addition or subtraction, but I can't =
tell if that is configurable at run time.  There are various multiplexers a=
nd a variety of inputs to a rather large mux for the large accumulator, unc=
lear how to control that one.  I guess I'll just have to ask if there is mo=
re documentation. =20

--=20

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209
```
```On Sunday, November 22, 2020 at 11:45:26 AM UTC-5, Gabor wrote:
> On Sunday, 11/22/2020 2:12 AM, Michael Kellett wrote:=20
> > On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:=20
> >> I working with the Gowin GW1N devices and need to do some serious=20
> >> math. By serious, I mean a number of calculations, not that they have=
=20
> >> to be fast. In fact, I pretty much have all the time in the world=20
> >> relatively speaking. The cycle time for performing all the=20
> >> calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.=20
> >>=20
> >> What I'm not up to speed about is just how to use or even infer such=
=20
> >> logic. Certainly they can be instantiated which I might do. But the=20
> >> docs are pretty poor. For each configuration, the user guide shows a=
=20
> >> set of equations it can implement, a block diagram with various=20
> >> control signals and data paths and then an interface prototype of I=20
> >> suppose the inferred object. The equations are very easy to=20
> >> understand...=20
> >> DOUT =3D A * B =C2=B1 C=20
> >> DOUT =3D =E2=88=91(A * B)=20
> >> DOUT =3D A * B + CASI=20
> >>=20
> >> The full capability is more complex, but the copy and paste has too=20
> >> many things to fix up to bother with. The point is they don't make it=
=20
> >> clear how the controls work or even what can be controlled in real=20
> >> time vs. needing to be configured. I guess I'll have to write some=20
> >> code and experiment with the synthesis. I can try writing support for=
=20
> >> some answers. This is a person rather than a black hole at a web=20
> >>=20
> >> I just wondered how this is done with other brands of devices.=20
> >>=20
> > By instantiation (in my case with Lattice or Altera), mainly because th=
e=20
> > DSP is avery limited resource and I needed full control over how it was=
=20
> > shared. My one big Vivado project with a chip with much more resources =
I=20
> > think I let the tools to a bit of inferring but mostly used blocks out=
=20
> > of the Xilinx IP collection.=20
> >=20
> > MK
> In the Xilinx tools, it's generally easy to infer your logic, and let=20
> the tools figure out how to place it in DSPs. I used to try to place=20
> pipeline registers where they were needed based on the DSP architecture,=
=20
> but soon found out that if you just place a lot of pipeline stages at=20
> the end, the tools will push the registers into the required places. If=
=20
> you were using 3rd party tools like Symplify Pro I would expect the same=
=20
> behavior regardless of the target FPGA. I'm not familiar with Gowin, so=
=20
> I couldn't tell you what to expect from their tools. In any case it's=20
> easy enough to write code for inference and see what the tools do with it=
.=20

Thanks for the reply.   I tried inference and it seems to be using a separa=
te DSP unit for the multiply and for the add.  Maybe I need to combine the =
two into a single assignment. =20

--=20

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209
```
```On 11/22/20 5:04 PM, gnuarm.del...@gmail.com wrote:
> On Sunday, November 22, 2020 at 2:13:03 AM UTC-5, Michael Kellett wrote:
>> On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
>>> I working with the Gowin GW1N devices and need to do some serious math. By serious, I mean a number of calculations, not that they have to be fast. In fact, I pretty much have all the time in the world relatively speaking. The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.
>>>
>>> What I'm not up to speed about is just how to use or even infer such logic. Certainly they can be instantiated which I might do. But the docs are pretty poor. For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object. The equations are very easy to understand...
>>> DOUT = A * B &plusmn; C
>>> DOUT = &sum;(A * B)
>>> DOUT = A * B + CASI
>>>
>>> The full capability is more complex, but the copy and paste has too many things to fix up to bother with. The point is they don't make it clear how the controls work or even what can be controlled in real time vs. needing to be configured. I guess I'll have to write some code and experiment with the synthesis. I can try writing support for some answers. This is a person rather than a black hole at a web site, so I usually get an adequate answer.
>>>
>>> I just wondered how this is done with other brands of devices.
>>>
>> By instantiation (in my case with Lattice or Altera), mainly because the
>> DSP is avery limited resource and I needed full control over how it was
>> shared. My one big Vivado project with a chip with much more resources I
>> think I let the tools to a bit of inferring but mostly used blocks out
>> of the Xilinx IP collection.
>
> I'd be willing to use inference if I actually could understand just what is in the DSP blocks.  The adder can do addition or subtraction, but I can't tell if that is configurable at run time.  There are various multiplexers and a variety of inputs to a rather large mux for the large accumulator, unclear how to control that one.  I guess I'll just have to ask if there is more documentation.
>
>

The IP tool lets you select a number of different operations selected by
an input value. As I remember both the input adder and the output adder
are dynamically configurable for adding or subtracting, but I would have
to double check that (At least for the part I was using)
```
```On Sunday, November 22, 2020 at 5:47:50 PM UTC-5, Richard Damon wrote:
> On 11/22/20 5:04 PM, gnuarm.del...@gmail.com wrote:=20
> > On Sunday, November 22, 2020 at 2:13:03 AM UTC-5, Michael Kellett wrote=
:=20
> >> On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:=20
> >>> I working with the Gowin GW1N devices and need to do some serious mat=
h. By serious, I mean a number of calculations, not that they have to be fa=
st. In fact, I pretty much have all the time in the world relatively speaki=
ng. The cycle time for performing all the calculations is 5 ms with a 33 MH=
z clock, so 167,000 odd cycles.=20
> >>>=20
> >>> What I'm not up to speed about is just how to use or even infer such =
logic. Certainly they can be instantiated which I might do. But the docs ar=
e pretty poor. For each configuration, the user guide shows a set of equati=
ons it can implement, a block diagram with various control signals and data=
paths and then an interface prototype of I suppose the inferred object. Th=
e equations are very easy to understand...=20
> >>> DOUT =3D A * B =C2=B1 C=20
> >>> DOUT =3D =E2=88=91(A * B)=20
> >>> DOUT =3D A * B + CASI=20
> >>>=20
> >>> The full capability is more complex, but the copy and paste has too m=
any things to fix up to bother with. The point is they don't make it clear =
how the controls work or even what can be controlled in real time vs. needi=
ng to be configured. I guess I'll have to write some code and experiment wi=
th the synthesis. I can try writing support for some answers. This is a per=
son rather than a black hole at a web site, so I usually get an adequate an=
swer.=20
> >>>=20
> >>> I just wondered how this is done with other brands of devices.=20
> >>>=20
> >> By instantiation (in my case with Lattice or Altera), mainly because t=
he=20
> >> DSP is avery limited resource and I needed full control over how it wa=
s=20
> >> shared. My one big Vivado project with a chip with much more resources=
I=20
> >> think I let the tools to a bit of inferring but mostly used blocks out=
=20
> >> of the Xilinx IP collection.=20
> >=20
> > I'd be willing to use inference if I actually could understand just wha=
t is in the DSP blocks. The adder can do addition or subtraction, but I can=
't tell if that is configurable at run time. There are various multiplexers=
and a variety of inputs to a rather large mux for the large accumulator, u=
nclear how to control that one. I guess I'll just have to ask if there is m=
ore documentation.=20
> >=20
> > Thanks for the reply=20
> >
> The IP tool lets you select a number of different operations selected by=
=20
> an input value. As I remember both the input adder and the output adder=
=20
> are dynamically configurable for adding or subtracting, but I would have=
=20
> to double check that (At least for the part I was using)

Yeah, I can't find where much is configurable in the application, rather th=
ere are parameters (generics) that establish connectivity and function.  Bu=
t the docs don't really explain just what they do, only the number of bits =
occupied. =20

GENERIC (
AREG:bit:=3D'0';
BREG:bit:=3D'0';
ASIGN_REG:bit:=3D'0';
BSIGN_REG:bit:=3D'0';
OUT_REG:bit:=3D'0';
ALUD_MODE:integer:=3D0;
ALU_RESET_MODE:string:=3D"SYNC"
);

As far as I can tell the operations are fixed and there's no way to selecti=
vely add/subtract in real time.  The ALU info seems to show a real time A a=
nd B sign input, but no real indication of what they do.  The combined mult=
iply-alu functions don't show the sign inputs, but do show A*B=C2=B1C in on=
e of the equations.  Are the multiply and ALU functions separate DSP blocks=
or are both contained in every DSP block?=20

--=20

Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209
```
```On 11/22/20 6:43 PM, gnuarm.del...@gmail.com wrote:
> On Sunday, November 22, 2020 at 5:47:50 PM UTC-5, Richard Damon wrote:
>> On 11/22/20 5:04 PM, gnuarm.del...@gmail.com wrote:
>>> On Sunday, November 22, 2020 at 2:13:03 AM UTC-5, Michael Kellett wrote:
>>>> On 22/11/2020 04:49, gnuarm.del...@gmail.com wrote:
>>>>> I working with the Gowin GW1N devices and need to do some serious math. By serious, I mean a number of calculations, not that they have to be fast. In fact, I pretty much have all the time in the world relatively speaking. The cycle time for performing all the calculations is 5 ms with a 33 MHz clock, so 167,000 odd cycles.
>>>>>
>>>>> What I'm not up to speed about is just how to use or even infer such logic. Certainly they can be instantiated which I might do. But the docs are pretty poor. For each configuration, the user guide shows a set of equations it can implement, a block diagram with various control signals and data paths and then an interface prototype of I suppose the inferred object. The equations are very easy to understand...
>>>>> DOUT = A * B &plusmn; C
>>>>> DOUT = &sum;(A * B)
>>>>> DOUT = A * B + CASI
>>>>>
>>>>> The full capability is more complex, but the copy and paste has too many things to fix up to bother with. The point is they don't make it clear how the controls work or even what can be controlled in real time vs. needing to be configured. I guess I'll have to write some code and experiment with the synthesis. I can try writing support for some answers. This is a person rather than a black hole at a web site, so I usually get an adequate answer.
>>>>>
>>>>> I just wondered how this is done with other brands of devices.
>>>>>
>>>> By instantiation (in my case with Lattice or Altera), mainly because the
>>>> DSP is avery limited resource and I needed full control over how it was
>>>> shared. My one big Vivado project with a chip with much more resources I
>>>> think I let the tools to a bit of inferring but mostly used blocks out
>>>> of the Xilinx IP collection.
>>>
>>> I'd be willing to use inference if I actually could understand just what is in the DSP blocks. The adder can do addition or subtraction, but I can't tell if that is configurable at run time. There are various multiplexers and a variety of inputs to a rather large mux for the large accumulator, unclear how to control that one. I guess I'll just have to ask if there is more documentation.
>>>
>>>
>> The IP tool lets you select a number of different operations selected by
>> an input value. As I remember both the input adder and the output adder
>> are dynamically configurable for adding or subtracting, but I would have
>> to double check that (At least for the part I was using)
>
> Yeah, I can't find where much is configurable in the application, rather there are parameters (generics) that establish connectivity and function.  But the docs don't really explain just what they do, only the number of bits occupied.
>
> GENERIC (
>   AREG:bit:='0';
>   BREG:bit:='0';
>   ASIGN_REG:bit:='0';
>   BSIGN_REG:bit:='0';
>   OUT_REG:bit:='0';
>   ALUD_MODE:integer:=0;
>   ALU_RESET_MODE:string:="SYNC"
>   );
>
> As far as I can tell the operations are fixed and there's no way to selectively add/subtract in real time.  The ALU info seems to show a real time A and B sign input, but no real indication of what they do.  The combined multiply-alu functions don't show the sign inputs, but do show A*B&plusmn;C in one of the equations.  Are the multiply and ALU functions separate DSP blocks or are both contained in every DSP block?
>

I tend to use the IP integrator to make a configured version of the DSP
block, and that integrator lets you make a list of operations. I would
need to try it to see if it allows both an add and a subtract selected
by the 'operaiton' input to the module.

I haven't figured out the incantations to directly generate some of
these blocks without using the IP integrator. I haven't found the
documentations for that.

My understanding is that the whole block is one module, and at least on
the part I am using they come in pairs that can be coupled to make a
bigger block with a faster path for a partial sum from one block to another.
```