FPGARelated.com
Forums

Cyclone V decimation

Started by Piotr Wyderski February 23, 2019
Hi,

the input signal is 14 bits signed@750ksps. I would like to decimate it 
by a modest factor of ~3000. What would be the best way of doing it on a 
Cyclone V, resource-wise? My usual approach would be a cascade of CIC
decimators followed by a FIR corrector, but since there are the DSP 
blocks, I don't feel it to be the "right" (albeit correct) approach. I'm 
new to the V family and lack the proper intuitions, so could someone 
more versed
suggest me a good direction?

In fact, there will be 12 such channels, all going in sync,
so maybe a considerable resouce sharing can be achieved?

	Best regards, Piotr
On Saturday, February 23, 2019 at 2:32:04 AM UTC-5, Piotr Wyderski wrote:
> Hi, > > the input signal is 14 bits signed@750ksps. I would like to decimate it > by a modest factor of ~3000. What would be the best way of doing it on a > Cyclone V, resource-wise? My usual approach would be a cascade of CIC > decimators followed by a FIR corrector, but since there are the DSP > blocks, I don't feel it to be the "right" (albeit correct) approach. I'm > new to the V family and lack the proper intuitions, so could someone > more versed > suggest me a good direction? > > In fact, there will be 12 such channels, all going in sync, > so maybe a considerable resouce sharing can be achieved? > > Best regards, Piotr
To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals? Rick C.
gnuarm.deletethisbit@gmail.com wrote:

> To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals?
Minimisation of resource usage, or in other words, a decimation technique that maps best onto the underlying primitives. I believe those 200+ DSP (multiply-accumulate) blocks are good for something... Best regards, Piotr
On Saturday, February 23, 2019 at 6:17:28 PM UTC+2, Piotr Wyderski wrote:
> gnuarm.deletethisbit@gmail.com wrote: > > > To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals? > > Minimisation of resource usage, or in other words, a decimation > technique that maps best onto the underlying primitives. I believe > those 200+ DSP (multiply-accumulate) blocks are good for something... > > Best regards, Piotr
If all you want is minimization of resource usage then just do CIC. Something else makes sense only if you want very flat pass band and very sharp transition between pass band and stop band. The problem with using generic FIR for decimation is not computation, which for your requirements would be minimal, but storage, both for coefficients and for delay line. Decimation by 3000 would need something like 15K coefficients for good filter shape or twice as many for very good shape. Coefficients storage could be cut in half due to filter's symmetry, but I am not aware of similar trick for delay line. So, overall you will need just 1 DSP block, but 40 to 80 M10K blocks. Of course, you always can trade storage for simplicity, by building you decimation chain as a cascade, probably sizing each stage for delay line to fit in 1 M10K block. Then the whole chain will take 3 stages and only 6 M10K blocks and filter shape could still be excellent. Or, may be, even 2 M10K blocks if you are ready to complicate a control machine a little more by placing all delay lines in a common M10K and doing the same for coefficients, But it is worth an increased complexity? I am not sure. And then there is variant in the middle - cascade of 2 stages instead of 3. Then each delay line and each set of FIR taps will fit in M9K, but two delay line wouldn't fit. So, with a bit of control acrobatics you could fit the whole cascade in 3 M9K blocks. Still, do it only if you care about shape of the filter , but don't do it for resources alone.
already5chosen@yahoo.com wrote:

> If all you want is minimization of resource usage then just do CIC > Something else makes sense only if you want very flat pass band and
very sharp transition between pass band and stop band. There is very little to no energy in the upper part of the band. The high ADC speed is there for other reasons. Therefore, CIC will be more than enough, at least in the first stages of the cascade. I don't know yet if it would be sufficient for the final stage, but this is a detail that can be tweaked in a later phase. So I have a licensing type of a question: can I instantiate DSP blocks in Quartus Lite? I know the DSP builder is an extra paid tool, but I don't need it -- a purely Verilog instantiation would be sufficient. This block appears to have a decent accumulator, so it could relieve the ALMs otherwise needed by the register-hungry CIC. Thank you! Best regards, Piotr
First of all, since your sample rates are pretty low, I'd see if it's possible to use a DSP chip instead of an FPGA.  Everything is easier in software.

Everything depends on your specs, which you have not stated.  Namely:  what is the attenuation of the stopband, and what is the slope between the passband and the stopband?  You say there is not much in the upper frequencies, so this makes it sound like your filtering requirements are very low.  If there is nothing much at all up there, you don't even need to filter.  Just decimate.  Take every nth sample.

The point of the CIC is to reduce the need for multipliers, but you have plenty of multipliers and low sample rates.  The CIC has big sidelobes.  It might be better to do a cascade of FIRs each with low numbers of taps.
On Saturday, February 23, 2019 at 11:17:28 AM UTC-5, Piotr Wyderski wrote:
> gnuarm.deletethisbit@gmail.com wrote: > > > To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals? > > Minimisation of resource usage, or in other words, a decimation > technique that maps best onto the underlying primitives. I believe > those 200+ DSP (multiply-accumulate) blocks are good for something... > > Best regards, Piotr
Is that your only criterion? Along with the 200+ DSP blocks I would expect the chip has many thousands of LUTs and FFs. Why focus on DSP block usage? I don't see a problem of using the CIC decimators if they otherwise work the way you want. A CIC filter had sharp nulls a particular points but doesn't do so much elsewhere while being very logic and energy efficient. They are typically finished by a relatively short FIR so the aggregate delay is not so large. Doing it all in a single filter would create a much longer delay, no? Other than the power usage of a large decimating FIR filter, I can't think of other trade offs. Rick C.
gnuarm.deletethisbit@gmail.com wrote:

> Is that your only criterion?
Well, basiclly, yes, it is the only degree of freedom. In other words: I can design any filtering structure that satisfies my requirements from the signal processing point of view, but not all structures are equally welcome by the FPGA, let alone an FPGA with DSP slices. Hence my question. I've already done it with a multistage CIC alone, but the hardware was much simpler and CIC approach was the only viable one. > Along with the 200+ DSP blocks I would expect the chip has many thousands of LUTs and FFs. Why focus on DSP block usage? One reason is to learn them, other is the ability to use a smaller chip. A DSP block is composed of two multipliers and an accumulator. The accumulator is what a CIC needs. There will be plenty of other functions occupying that FFs. Best regards, Piotr
On Sunday, February 24, 2019 at 1:23:21 AM UTC-5, Piotr Wyderski wrote:
> gnuarm.deletethisbit@gmail.com wrote: > > > Is that your only criterion? > > Well, basiclly, yes, it is the only degree of freedom. In other words: > I can design any filtering structure that satisfies my requirements from > the signal processing point of view, but not all structures are equally > welcome by the FPGA, let alone an FPGA with DSP slices. Hence my question. > > I've already done it with a multistage CIC alone, but the hardware > was much simpler and CIC approach was the only viable one. > > > Along with the 200+ DSP blocks I would expect the chip has many > thousands of LUTs and FFs. Why focus on DSP block usage? > > One reason is to learn them, other is the ability to use a smaller chip. > A DSP block is composed of two multipliers and an accumulator. The > accumulator is what a CIC needs. There will be plenty of other functions > occupying that FFs.
You haven't given us much to go on. As some have pointed out you can do the decimation in multiple stages and use smaller FIR filters at each point, or use on ginormous FIR filter. In both cases a polyphase organization will reduce the number of calculations needed. Or you can use the CIC filter as a front end. I don't know any of the details, so I have no way of calculating the resource usage. I think it is pretty obvious what the trade offs are. Squeeze here and this toothpaste comes out there. Squeeze there and other toothpaste comes out somewhere else. To know where to squeeze and how hard the numbers are important. Rick C.
already5chosen@yahoo.com wrote:

> If all you want is minimization of resource usage then just do CIC.
As an afterthought: given the number of channels, their relative slow speed and the requirement of lockstep processing, perhaps a bit-serial CIC would be a good idea? Other parts of the design can benefit greatly from massive application of this approach and it would be a powerful cerebral decalcifier. I think it is worth doing even if just to learn it makes no sense. Thank you all for your help! Best regards, Piotr