comp.arch.fpga | Downsizing Verilog synthesization.

Hi guys:

I'm prototyping an application using a Xilinx Spartan-3 development
board.  I'm using this particular development kit because it is suited
to the large amount of I/O I need.

I'm new to FPGA, so I have written the code in Verilog using almost
exclusively a high-level, behavioural style.  The program works, but
synthesizes using 99% of the available slices.  So if I try to change
or improve the code, it often synthesizes to over 100% and kicks out
an error.

I need to condense what I've got to give me some space to work with.

The application is basically a large number of high-speed pulse
inputs.  I count them all independently and average several readings
over time for each to produce a 21-bit number.  Each of these 21-bit
vectors (there are almost 100) is sent to a central processing module
that evaluates and compares them using simple arithmetic.  Based on
these comparisons, another set of vectors is sent on to a couple of
modules that arrange them into a special synchronous serial output.
That's all it does.

Are there any standard tips or general guidelines that you might offer
to condense my synthesis?  I have found, for example, that making the
vectors smaller doesn't really change the overall slice count, yet
commenting out a single line of the processing code can change it
drastically.

Any ideas or comments would be greatly appreciated.

Don

Reply by Mike Treseler ●August 6, 20082008-08-06

eromlignod wrote:

> The application is basically a large number of high-speed pulse
> inputs.  I count them all independently and average several readings
> over time for each to produce a 21-bit number.  Each of these 21-bit
> vectors (there are almost 100) is sent to a central processing module
> that evaluates and compares them using simple arithmetic.  Based on
> these comparisons, another set of vectors is sent on to a couple of
> modules that arrange them into a special synchronous serial output.

Since the answer is shifted out in serial,
maybe it could be constructed a bit at a time
to save resources.

> Are there any standard tips or general guidelines that you might offer
> to condense my synthesis? 

A basic trade is time for gates.
A serial crc is slower, but requires less resources
than the parallel version, for example.

    -- Mike Treseler

Reply by John_H ●August 6, 20082008-08-06

eromlignod wrote:
<snip>
> 
> Are there any standard tips or general guidelines that you might offer
> to condense my synthesis?  I have found, for example, that making the
> vectors smaller doesn't really change the overall slice count, yet
> commenting out a single line of the processing code can change it
> drastically.
> 
> Any ideas or comments would be greatly appreciated.
> 
> Don

Time multiplexing can often help significantly.

If you have 100 21-bit counters, the 2100 registers and associated 
muxing can take a lot of space.  If you're not running near the limit of 
the part, you could increase the clock rate and share some counters in 
distributed memory.  If things are really slow, you can go to BlockRAMs, 
eliminating the redundancy and reducing read mux logic significantly.

If you can't increase the processing frequency, you could still count 
the LSbits and cycle through the counters, adding and clearing the 
LSbits to a BlockRAM worth of counter values.  To cycle through 255 
32-bit counters, you'd need 8-bit counters for each signal and a 
read-add-write cycle (using the dual-port mode) for each entry in your 
list.  You end up only using half the BlockRAM for this extreme number 
of counters.

It's more housekeeping but you use a fraction of the count resources.

Reply by John McCaskill ●August 6, 20082008-08-06

On Aug 6, 8:40=A0am, eromlignod <eromlig...@aol.com> wrote:
> Hi guys:
>
> I'm prototyping an application using a Xilinx Spartan-3 development
> board. =A0I'm using this particular development kit because it is suited
> to the large amount of I/O I need.
>
> I'm new to FPGA, so I have written the code in Verilog using almost
> exclusively a high-level, behavioural style. =A0The program works, but
> synthesizes using 99% of the available slices. =A0So if I try to change
> or improve the code, it often synthesizes to over 100% and kicks out
> an error.
>
> I need to condense what I've got to give me some space to work with.
>
> The application is basically a large number of high-speed pulse
> inputs. =A0I count them all independently and average several readings
> over time for each to produce a 21-bit number. =A0Each of these 21-bit
> vectors (there are almost 100) is sent to a central processing module
> that evaluates and compares them using simple arithmetic. =A0Based on
> these comparisons, another set of vectors is sent on to a couple of
> modules that arrange them into a special synchronous serial output.
> That's all it does.
>
> Are there any standard tips or general guidelines that you might offer
> to condense my synthesis? =A0I have found, for example, that making the
> vectors smaller doesn't really change the overall slice count, yet
> commenting out a single line of the processing code can change it
> drastically.
>
> Any ideas or comments would be greatly appreciated.
>
> Don

Since you state that you run out of slices, I know that your design is
larger than the FPGA can hold, but I would still point out that the
slice utilization is a pessimistic view of how much of the FPGA you
are using, the mapping stage spreads the logic out by default instead
of packing it as tightly as possible.  The Register and LUT
utilization is an optimistic measure of how much of the FPGA you have
left.  You need to watch all of them to get a good idea of how full
your design really is.

You mention both a high speed pulse counting section that counts and
averages over time, and then a processing section that sounds like it
is slower. How much slower is it?  If you can share resources over
time in this section you could save resources.

You can look in the reports to see how many adders, etc the tools
inferred from your code.  Your goal is to reduce that number to the
minimum required to perform the comparisons.  You have a range of
options that depend on your constraints.  At one end of the spectrum,
just find any redundant calculations and rearrange your code to share
those calculations. At the other end, you could use a soft processor
such as a PicoBlaze to do the calculations in software.

Regards,

John McCaskill
www.FasterTechnology.com

Reply by eromlignod ●August 6, 20082008-08-06

On Aug 6, 8:56=A0am, Mike Treseler <mtrese...@gmail.com> wrote:
> eromlignod wrote:
> > The application is basically a large number of high-speed pulse
> > inputs. =A0I count them all independently and average several readings
> > over time for each to produce a 21-bit number. =A0Each of these 21-bit
> > vectors (there are almost 100) is sent to a central processing module
> > that evaluates and compares them using simple arithmetic. =A0Based on
> > these comparisons, another set of vectors is sent on to a couple of
> > modules that arrange them into a special synchronous serial output.
>
> Since the answer is shifted out in serial,
> maybe it could be constructed a bit at a time
> to save resources.
>
> > Are there any standard tips or general guidelines that you might offer
> > to condense my synthesis?
>
> A basic trade is time for gates.
> A serial crc is slower, but requires less resources
> than the parallel version, for example.
>
> =A0 =A0 -- Mike Treseler

Mike:

I'm intrigued by your answer, but don't fully understand what you
propose.  You say that I should construct my serial signal a bit at a
time, but how else can I?

My last serial generating module has a big 256 vector input that it is
translating to a serial output that repeats the 256 bits over and
over.  The code is basically something like this:

input [255:0] invector;
output serout;
reg [7:0] x;

always @(negedge shiftclock)
   begin
      x =3D x + 1;
      serout =3D invector[x];
   end

I'll bet there's a better way.

Don

Reply by eromlignod ●August 6, 20082008-08-06

On Aug 6, 9:21=A0am, John McCaskill <jhmccask...@gmail.com> wrote:
> On Aug 6, 8:40=A0am, eromlignod <eromlig...@aol.com> wrote:
>
>
>
>
>
> > Hi guys:
>
> > I'm prototyping an application using a Xilinx Spartan-3 development
> > board. =A0I'm using this particular development kit because it is suite=
d
> > to the large amount of I/O I need.
>
> > I'm new to FPGA, so I have written the code in Verilog using almost
> > exclusively a high-level, behavioural style. =A0The program works, but
> > synthesizes using 99% of the available slices. =A0So if I try to change
> > or improve the code, it often synthesizes to over 100% and kicks out
> > an error.
>
> > I need to condense what I've got to give me some space to work with.
>
> > The application is basically a large number of high-speed pulse
> > inputs. =A0I count them all independently and average several readings
> > over time for each to produce a 21-bit number. =A0Each of these 21-bit
> > vectors (there are almost 100) is sent to a central processing module
> > that evaluates and compares them using simple arithmetic. =A0Based on
> > these comparisons, another set of vectors is sent on to a couple of
> > modules that arrange them into a special synchronous serial output.
> > That's all it does.
>
> > Are there any standard tips or general guidelines that you might offer
> > to condense my synthesis? =A0I have found, for example, that making the
> > vectors smaller doesn't really change the overall slice count, yet
> > commenting out a single line of the processing code can change it
> > drastically.
>
> > Any ideas or comments would be greatly appreciated.
>
> > Don
>
> Since you state that you run out of slices, I know that your design is
> larger than the FPGA can hold, but I would still point out that the
> slice utilization is a pessimistic view of how much of the FPGA you
> are using, the mapping stage spreads the logic out by default instead
> of packing it as tightly as possible. =A0The Register and LUT
> utilization is an optimistic measure of how much of the FPGA you have
> left. =A0You need to watch all of them to get a good idea of how full
> your design really is.
>
> You mention both a high speed pulse counting section that counts and
> averages over time, and then a processing section that sounds like it
> is slower. How much slower is it? =A0If you can share resources over
> time in this section you could save resources.
>
> You can look in the reports to see how many adders, etc the tools
> inferred from your code. =A0Your goal is to reduce that number to the
> minimum required to perform the comparisons. =A0You have a range of
> options that depend on your constraints. =A0At one end of the spectrum,
> just find any redundant calculations and rearrange your code to share
> those calculations. At the other end, you could use a soft processor
> such as a PicoBlaze to do the calculations in software.
>
> Regards,
>
> John McCaskillwww.FasterTechnology.com- Hide quoted text -
>
> - Show quoted text -

What sorts of operations are the biggest gate-hogs?

I have a lot of comparison "if" operations, counters, and non-blocking
assignments to convert lots of inputs into usable arrays.  The
averagers each divide by 32 and I have another single divider toward
the end that divides by 256.  Other than that, I'm not doing anything
very fancy.  I have no multipliers (though I might like to add one),
no "for" loops, etc.

I do have a series of hard-coded standard values that I use for
comparison.  They are in the form of parameters that are fed to each
of the input counter modules when they are instantiated in the top
module.  I suppose these could be EPROM memories, but I haven't
figured out yet how to use the memory provided on the development
board.

Don

Reply by Gabor ●August 6, 20082008-08-06

On Aug 6, 10:43 am, eromlignod <eromlig...@aol.com> wrote:
> On Aug 6, 9:21 am, John McCaskill <jhmccask...@gmail.com> wrote:
>
>
>
> > On Aug 6, 8:40 am, eromlignod <eromlig...@aol.com> wrote:
>
> > > Hi guys:
>
> > > I'm prototyping an application using a Xilinx Spartan-3 development
> > > board.  I'm using this particular development kit because it is suited
> > > to the large amount of I/O I need.
>
> > > I'm new to FPGA, so I have written the code in Verilog using almost
> > > exclusively a high-level, behavioural style.  The program works, but
> > > synthesizes using 99% of the available slices.  So if I try to change
> > > or improve the code, it often synthesizes to over 100% and kicks out
> > > an error.
>
> > > I need to condense what I've got to give me some space to work with.
>
> > > The application is basically a large number of high-speed pulse
> > > inputs.  I count them all independently and average several readings
> > > over time for each to produce a 21-bit number.  Each of these 21-bit
> > > vectors (there are almost 100) is sent to a central processing module
> > > that evaluates and compares them using simple arithmetic.  Based on
> > > these comparisons, another set of vectors is sent on to a couple of
> > > modules that arrange them into a special synchronous serial output.
> > > That's all it does.
>
> > > Are there any standard tips or general guidelines that you might offer
> > > to condense my synthesis?  I have found, for example, that making the
> > > vectors smaller doesn't really change the overall slice count, yet
> > > commenting out a single line of the processing code can change it
> > > drastically.
>
> > > Any ideas or comments would be greatly appreciated.
>
> > > Don
>
> > Since you state that you run out of slices, I know that your design is
> > larger than the FPGA can hold, but I would still point out that the
> > slice utilization is a pessimistic view of how much of the FPGA you
> > are using, the mapping stage spreads the logic out by default instead
> > of packing it as tightly as possible.  The Register and LUT
> > utilization is an optimistic measure of how much of the FPGA you have
> > left.  You need to watch all of them to get a good idea of how full
> > your design really is.
>
> > You mention both a high speed pulse counting section that counts and
> > averages over time, and then a processing section that sounds like it
> > is slower. How much slower is it?  If you can share resources over
> > time in this section you could save resources.
>
> > You can look in the reports to see how many adders, etc the tools
> > inferred from your code.  Your goal is to reduce that number to the
> > minimum required to perform the comparisons.  You have a range of
> > options that depend on your constraints.  At one end of the spectrum,
> > just find any redundant calculations and rearrange your code to share
> > those calculations. At the other end, you could use a soft processor
> > such as a PicoBlaze to do the calculations in software.
>
> > Regards,
>
> > John McCaskillwww.FasterTechnology.com-Hide quoted text -
>
> > - Show quoted text -
>
> What sorts of operations are the biggest gate-hogs?
>
> I have a lot of comparison "if" operations, counters, and non-blocking
> assignments to convert lots of inputs into usable arrays.  The
> averagers each divide by 32 and I have another single divider toward
> the end that divides by 256.  Other than that, I'm not doing anything
> very fancy.  I have no multipliers (though I might like to add one),
> no "for" loops, etc.
>
> I do have a series of hard-coded standard values that I use for
> comparison.  They are in the form of parameters that are fed to each
> of the input counter modules when they are instantiated in the top
> module.  I suppose these could be EPROM memories, but I haven't
> figured out yet how to use the memory provided on the development
> board.
>
> Don

What tools are you using for synthesis?  If ISE / XST (webpack or
foundation from Xilinx) which version?

Things like divide by power of two should take no resources whatever
(i.e. shift operators are basically wires).  However a synthesis tool
may look at the division operator and think you need a divider, which
will take a lot of logic.

Also since you seem to be register-heavy, see where you can
use serial shift registers or memory instead of loose flip-flops.
In Spartan 3 you get 16 stages of serial shift  register or 16
bits of distributed RAM from a single LUT site.  Coding shift
registers without a reset term allows the synthesizer to place
them in these structures instead of flip-flops (which come
one to a LUT site).

Did you look at your map report or "design summary"?  In
the latest version of ISE the design summary can show you
where your largest resource allocations come from.

Regards,
Gabor

Reply by eromlignod ●August 6, 20082008-08-06

On Aug 6, 10:49=A0am, Gabor <ga...@alacron.com> wrote:
> On Aug 6, 10:43 am, eromlignod <eromlig...@aol.com> wrote:
>
>
>
>
>
> > On Aug 6, 9:21 am, John McCaskill <jhmccask...@gmail.com> wrote:
>
> > > On Aug 6, 8:40 am, eromlignod <eromlig...@aol.com> wrote:
>
> > > > Hi guys:
>
> > > > I'm prototyping an application using a Xilinx Spartan-3 development
> > > > board. =A0I'm using this particular development kit because it is s=
uited
> > > > to the large amount of I/O I need.
>
> > > > I'm new to FPGA, so I have written the code in Verilog using almost
> > > > exclusively a high-level, behavioural style. =A0The program works, =
but
> > > > synthesizes using 99% of the available slices. =A0So if I try to ch=
ange
> > > > or improve the code, it often synthesizes to over 100% and kicks ou=
t
> > > > an error.
>
> > > > I need to condense what I've got to give me some space to work with=
.
>
> > > > The application is basically a large number of high-speed pulse
> > > > inputs. =A0I count them all independently and average several readi=
ngs
> > > > over time for each to produce a 21-bit number. =A0Each of these 21-=
bit
> > > > vectors (there are almost 100) is sent to a central processing modu=
le
> > > > that evaluates and compares them using simple arithmetic. =A0Based =
on
> > > > these comparisons, another set of vectors is sent on to a couple of
> > > > modules that arrange them into a special synchronous serial output.
> > > > That's all it does.
>
> > > > Are there any standard tips or general guidelines that you might of=
fer
> > > > to condense my synthesis? =A0I have found, for example, that making=
 the
> > > > vectors smaller doesn't really change the overall slice count, yet
> > > > commenting out a single line of the processing code can change it
> > > > drastically.
>
> > > > Any ideas or comments would be greatly appreciated.
>
> > > > Don
>
> > > Since you state that you run out of slices, I know that your design i=
s
> > > larger than the FPGA can hold, but I would still point out that the
> > > slice utilization is a pessimistic view of how much of the FPGA you
> > > are using, the mapping stage spreads the logic out by default instead
> > > of packing it as tightly as possible. =A0The Register and LUT
> > > utilization is an optimistic measure of how much of the FPGA you have
> > > left. =A0You need to watch all of them to get a good idea of how full
> > > your design really is.
>
> > > You mention both a high speed pulse counting section that counts and
> > > averages over time, and then a processing section that sounds like it
> > > is slower. How much slower is it? =A0If you can share resources over
> > > time in this section you could save resources.
>
> > > You can look in the reports to see how many adders, etc the tools
> > > inferred from your code. =A0Your goal is to reduce that number to the
> > > minimum required to perform the comparisons. =A0You have a range of
> > > options that depend on your constraints. =A0At one end of the spectru=
m,
> > > just find any redundant calculations and rearrange your code to share
> > > those calculations. At the other end, you could use a soft processor
> > > such as a PicoBlaze to do the calculations in software.
>
> > > Regards,
>
> > > John McCaskillwww.FasterTechnology.com-Hidequoted text -
>
> > > - Show quoted text -
>
> > What sorts of operations are the biggest gate-hogs?
>
> > I have a lot of comparison "if" operations, counters, and non-blocking
> > assignments to convert lots of inputs into usable arrays. =A0The
> > averagers each divide by 32 and I have another single divider toward
> > the end that divides by 256. =A0Other than that, I'm not doing anything
> > very fancy. =A0I have no multipliers (though I might like to add one),
> > no "for" loops, etc.
>
> > I do have a series of hard-coded standard values that I use for
> > comparison. =A0They are in the form of parameters that are fed to each
> > of the input counter modules when they are instantiated in the top
> > module. =A0I suppose these could be EPROM memories, but I haven't
> > figured out yet how to use the memory provided on the development
> > board.
>
> > Don
>
> What tools are you using for synthesis? =A0If ISE / XST (webpack or
> foundation from Xilinx) which version?
>
> Things like divide by power of two should take no resources whatever
> (i.e. shift operators are basically wires). =A0However a synthesis tool
> may look at the division operator and think you need a divider, which
> will take a lot of logic.
>
> Also since you seem to be register-heavy, see where you can
> use serial shift registers or memory instead of loose flip-flops.
> In Spartan 3 you get 16 stages of serial shift =A0register or 16
> bits of distributed RAM from a single LUT site. =A0Coding shift
> registers without a reset term allows the synthesizer to place
> them in these structures instead of flip-flops (which come
> one to a LUT site).
>
> Did you look at your map report or "design summary"? =A0In
> the latest version of ISE the design summary can show you
> where your largest resource allocations come from.
>
> Regards,
> Gabor- Hide quoted text -
>
> - Show quoted text -


Interesting.

Thanks Gabor!  This may be very useful.  I have a large number of 8-
bit vectors in my design.  I have about 220 of them passing from one
module to another.  They each begin as an "output reg [7:0]" in one
module and are all assigned to an array in the other module like this.

reg [7:0] array [219:0];
=2E..
y[0] <=3D array[0];
y[1] <=3D array[1];
y[2] <=3D array[3];
=2E..etc.

Is this bad form?

Don

Reply by eromlignod ●August 6, 20082008-08-06

On Aug 6, 11:24=A0am, eromlignod <eromlig...@aol.com> wrote:
> On Aug 6, 10:49=A0am, Gabor <ga...@alacron.com> wrote:
>
>
>
>
>
> > On Aug 6, 10:43 am, eromlignod <eromlig...@aol.com> wrote:
>
> > > On Aug 6, 9:21 am, John McCaskill <jhmccask...@gmail.com> wrote:
>
> > > > On Aug 6, 8:40 am, eromlignod <eromlig...@aol.com> wrote:
>
> > > > > Hi guys:
>
> > > > > I'm prototyping an application using a Xilinx Spartan-3 developme=
nt
> > > > > board. =A0I'm using this particular development kit because it is=
 suited
> > > > > to the large amount of I/O I need.
>
> > > > > I'm new to FPGA, so I have written the code in Verilog using almo=
st
> > > > > exclusively a high-level, behavioural style. =A0The program works=
, but
> > > > > synthesizes using 99% of the available slices. =A0So if I try to =
change
> > > > > or improve the code, it often synthesizes to over 100% and kicks =
out
> > > > > an error.
>
> > > > > I need to condense what I've got to give me some space to work wi=
th.
>
> > > > > The application is basically a large number of high-speed pulse
> > > > > inputs. =A0I count them all independently and average several rea=
dings
> > > > > over time for each to produce a 21-bit number. =A0Each of these 2=
1-bit
> > > > > vectors (there are almost 100) is sent to a central processing mo=
dule
> > > > > that evaluates and compares them using simple arithmetic. =A0Base=
d on
> > > > > these comparisons, another set of vectors is sent on to a couple =
of
> > > > > modules that arrange them into a special synchronous serial outpu=
t.
> > > > > That's all it does.
>
> > > > > Are there any standard tips or general guidelines that you might =
offer
> > > > > to condense my synthesis? =A0I have found, for example, that maki=
ng the
> > > > > vectors smaller doesn't really change the overall slice count, ye=
t
> > > > > commenting out a single line of the processing code can change it
> > > > > drastically.
>
> > > > > Any ideas or comments would be greatly appreciated.
>
> > > > > Don
>
> > > > Since you state that you run out of slices, I know that your design=
 is
> > > > larger than the FPGA can hold, but I would still point out that the
> > > > slice utilization is a pessimistic view of how much of the FPGA you
> > > > are using, the mapping stage spreads the logic out by default inste=
ad
> > > > of packing it as tightly as possible. =A0The Register and LUT
> > > > utilization is an optimistic measure of how much of the FPGA you ha=
ve
> > > > left. =A0You need to watch all of them to get a good idea of how fu=
ll
> > > > your design really is.
>
> > > > You mention both a high speed pulse counting section that counts an=
d
> > > > averages over time, and then a processing section that sounds like =
it
> > > > is slower. How much slower is it? =A0If you can share resources ove=
r
> > > > time in this section you could save resources.
>
> > > > You can look in the reports to see how many adders, etc the tools
> > > > inferred from your code. =A0Your goal is to reduce that number to t=
he
> > > > minimum required to perform the comparisons. =A0You have a range of
> > > > options that depend on your constraints. =A0At one end of the spect=
rum,
> > > > just find any redundant calculations and rearrange your code to sha=
re
> > > > those calculations. At the other end, you could use a soft processo=
r
> > > > such as a PicoBlaze to do the calculations in software.
>
> > > > Regards,
>
> > > > John McCaskillwww.FasterTechnology.com-Hidequotedtext -
>
> > > > - Show quoted text -
>
> > > What sorts of operations are the biggest gate-hogs?
>
> > > I have a lot of comparison "if" operations, counters, and non-blockin=
g
> > > assignments to convert lots of inputs into usable arrays. =A0The
> > > averagers each divide by 32 and I have another single divider toward
> > > the end that divides by 256. =A0Other than that, I'm not doing anythi=
ng
> > > very fancy. =A0I have no multipliers (though I might like to add one)=
,
> > > no "for" loops, etc.
>
> > > I do have a series of hard-coded standard values that I use for
> > > comparison. =A0They are in the form of parameters that are fed to eac=
h
> > > of the input counter modules when they are instantiated in the top
> > > module. =A0I suppose these could be EPROM memories, but I haven't
> > > figured out yet how to use the memory provided on the development
> > > board.
>
> > > Don
>
> > What tools are you using for synthesis? =A0If ISE / XST (webpack or
> > foundation from Xilinx) which version?
>
> > Things like divide by power of two should take no resources whatever
> > (i.e. shift operators are basically wires). =A0However a synthesis tool
> > may look at the division operator and think you need a divider, which
> > will take a lot of logic.
>
> > Also since you seem to be register-heavy, see where you can
> > use serial shift registers or memory instead of loose flip-flops.
> > In Spartan 3 you get 16 stages of serial shift =A0register or 16
> > bits of distributed RAM from a single LUT site. =A0Coding shift
> > registers without a reset term allows the synthesizer to place
> > them in these structures instead of flip-flops (which come
> > one to a LUT site).
>
> > Did you look at your map report or "design summary"? =A0In
> > the latest version of ISE the design summary can show you
> > where your largest resource allocations come from.
>
> > Regards,
> > Gabor- Hide quoted text -
>
> > - Show quoted text -
>
> Interesting.
>
> Thanks Gabor! =A0This may be very useful. =A0I have a large number of 8-
> bit vectors in my design. =A0I have about 220 of them passing from one
> module to another. =A0They each begin as an "output reg [7:0]" in one
> module and are all assigned to an array in the other module like this.
>
> reg [7:0] array [219:0];
> ...
> y[0] <=3D array[0];
> y[1] <=3D array[1];
> y[2] <=3D array[3];
> ...etc.
>
> Is this bad form?
>
> Don- Hide quoted text -
>
> - Show quoted text -

Oops.  I meant for that code to be:

input [7:0] y0;
input [7:0] y1;
=2E..
reg [7:0] array [219:0];
=2E..
array[0] <=3D y0;
array[1] <=3D y1;
=2E..etc.


Don

Reply by Mike Treseler ●August 6, 20082008-08-06

eromlignod wrote:

> Mike:
> 
> I'm intrigued by your answer, but don't fully understand what you
> propose.  You say that I should construct my serial signal a bit at a
> time, but how else can I?

I meant to suggest arranging some sort of pipeline
to work on the math while you are shifting the answer out.

        -- Mike Treseler

Previous12 3 4 5 Next

Downsizing Verilog synthesization.

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group