Reply by Piotr Wyderski March 1, 20192019-03-01
gnuarm.deletethisbit@gmail.com wrote:

> When I have looked at performing bit serial calculations I've found it to not be a large savings of logic and often using more FFs.
You are right, several initial attempts indicate that the savings are minor if I apply time multiplexing carefully. It was a refreshing experience, though, so no time wasted. The large decimation factor implies the final bandwidth is narrow, so even a very modest 4-stage decimating by 4 CIC filter has about 100dB of attenuation around the +/-20kHz DC image frequencies. There will be considerable aliasing above that, but I'm going to filter it out anyway later, so why bother. The subsequent filters will work at a much lower data rate, so I can bump up their order or even change their topology to something other than a CIC. Lesson learned: narrow-band CIC attenuation doesn't depend on the filter order considerably. Obvious when you think about it, but for some reason it wasn't. OK, I have my answer, thank you all for your contribution! Best regards, Piotr
Reply by February 25, 20192019-02-25
mandag den 25. februar 2019 kl. 22.38.02 UTC+1 skrev Benjamin Couillard:
> Le samedi 23 février 2019 02:32:04 UTC-5, Piotr Wyderski a écrit : > > Hi, > > > > the input signal is 14 bits signed@750ksps. I would like to decimate it > > by a modest factor of ~3000. What would be the best way of doing it on a > > Cyclone V, resource-wise? My usual approach would be a cascade of CIC > > decimators followed by a FIR corrector, but since there are the DSP > > blocks, I don't feel it to be the "right" (albeit correct) approach. I'm > > new to the V family and lack the proper intuitions, so could someone > > more versed > > suggest me a good direction? > > > > In fact, there will be 12 such channels, all going in sync, > > so maybe a considerable resouce sharing can be achieved? > > > > Best regards, Piotr > > You could also use halfband FIR filters, they are really efficient. Again, I really recommed Rick Lyon DSP book, it is a really good book, it is not too mathy. Basically a 16-tap halfband filter will only use 4 multipliers instead of 16. > > Assuming you decimate by 2048 i.e 2^11, you would need abut 44 multipliers. Furthermore, you can time-multiplex and reuse the multipliers, so you could probably get by using one hardware multiplier per stage for a total of 11 multipliers.
with each stage running at half the rate of the previous it should be possible to stagger the calculations so you only need (slightly less) than twice the first stage 1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1.... -2---2---2---2---2---2---2---2---.... ---3-------3-------3-------3-----.... -------4---------------4---------....
Reply by Benjamin Couillard February 25, 20192019-02-25
Le samedi 23 f=C3=A9vrier 2019 02:32:04 UTC-5, Piotr Wyderski a =C3=A9crit=
=C2=A0:
> Hi, >=20 > the input signal is 14 bits signed@750ksps. I would like to decimate it=
=20
> by a modest factor of ~3000. What would be the best way of doing it on a=
=20
> Cyclone V, resource-wise? My usual approach would be a cascade of CIC > decimators followed by a FIR corrector, but since there are the DSP=20 > blocks, I don't feel it to be the "right" (albeit correct) approach. I'm=
=20
> new to the V family and lack the proper intuitions, so could someone=20 > more versed > suggest me a good direction? >=20 > In fact, there will be 12 such channels, all going in sync, > so maybe a considerable resouce sharing can be achieved? >=20 > Best regards, Piotr
You could also use halfband FIR filters, they are really efficient. Again, = I really recommed Rick Lyon DSP book, it is a really good book, it is not t= oo mathy. Basically a 16-tap halfband filter will only use 4 multipliers i= nstead of 16. Assuming you decimate by 2048 i.e 2^11, you would need abut 44 multipliers.= Furthermore, you can time-multiplex and reuse the multipliers, so you coul= d probably get by using one hardware multiplier per stage for a total of 11= multipliers.
Reply by Rob Gaddi February 25, 20192019-02-25
On 2/22/19 11:31 PM, Piotr Wyderski wrote:
> Hi, > > the input signal is 14 bits signed@750ksps. I would like to decimate it > by a modest factor of ~3000. What would be the best way of doing it on a > Cyclone V, resource-wise? My usual approach would be a cascade of CIC > decimators followed by a FIR corrector, but since there are the DSP > blocks, I don't feel it to be the "right" (albeit correct) approach. I'm > new to the V family and lack the proper intuitions, so could someone > more versed > suggest me a good direction? > > In fact, there will be 12 such channels, all going in sync, > so maybe a considerable resouce sharing can be achieved? > >     Best regards, Piotr
This may be a better question over at comp.dsp. That said, and given what you've said in other responses, your best answer may be to use a polyphase decimating FIR filter. In effect, you'd use a 12000 tap FIR filter, but only 4 taps of it at a time. Understanding Digital Signal Processing (Lyons, 2011) has a good enough treatment on the subject for a general purpose DSP book. Multirate Digital Signal Processing (Crochiere and Rabiner, 1983) has an excellent and extremely rigorous treatment on the subject, but is out-of-print and a far less general book in general. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
Reply by February 25, 20192019-02-25
On Monday, February 25, 2019 at 2:36:33 AM UTC-5, Piotr Wyderski wrote:
> already5chosen@yahoo.com wrote: > > > If all you want is minimization of resource usage then just do CIC. > > As an afterthought: given the number of channels, their relative slow > speed and the requirement of lockstep processing, perhaps a bit-serial > CIC would be a good idea? > > Other parts of the design can benefit greatly from massive application > of this approach and it would be a powerful cerebral decalcifier. I think > it is worth doing even if just to learn it makes no sense. > > Thank you all for your help!
When I have looked at performing bit serial calculations I've found it to not be a large savings of logic and often using more FFs. If you use some form of RAM, either distributed or block, the FF savings can be good. I suppose the Xilinx LUT shift registers come in handy for this. I think they are still the only ones doing that. I suppose once you get your head wrapped around the bit serial thing, it can be easy to do. It can make it a bit harder to extend the precision at each stage since that means the bit count changes and so the timing. Rick C.
Reply by Piotr Wyderski February 25, 20192019-02-25
already5chosen@yahoo.com wrote:

> If all you want is minimization of resource usage then just do CIC.
As an afterthought: given the number of channels, their relative slow speed and the requirement of lockstep processing, perhaps a bit-serial CIC would be a good idea? Other parts of the design can benefit greatly from massive application of this approach and it would be a powerful cerebral decalcifier. I think it is worth doing even if just to learn it makes no sense. Thank you all for your help! Best regards, Piotr
Reply by February 24, 20192019-02-24
On Sunday, February 24, 2019 at 1:23:21 AM UTC-5, Piotr Wyderski wrote:
> gnuarm.deletethisbit@gmail.com wrote: > > > Is that your only criterion? > > Well, basiclly, yes, it is the only degree of freedom. In other words: > I can design any filtering structure that satisfies my requirements from > the signal processing point of view, but not all structures are equally > welcome by the FPGA, let alone an FPGA with DSP slices. Hence my question. > > I've already done it with a multistage CIC alone, but the hardware > was much simpler and CIC approach was the only viable one. > > > Along with the 200+ DSP blocks I would expect the chip has many > thousands of LUTs and FFs. Why focus on DSP block usage? > > One reason is to learn them, other is the ability to use a smaller chip. > A DSP block is composed of two multipliers and an accumulator. The > accumulator is what a CIC needs. There will be plenty of other functions > occupying that FFs.
You haven't given us much to go on. As some have pointed out you can do the decimation in multiple stages and use smaller FIR filters at each point, or use on ginormous FIR filter. In both cases a polyphase organization will reduce the number of calculations needed. Or you can use the CIC filter as a front end. I don't know any of the details, so I have no way of calculating the resource usage. I think it is pretty obvious what the trade offs are. Squeeze here and this toothpaste comes out there. Squeeze there and other toothpaste comes out somewhere else. To know where to squeeze and how hard the numbers are important. Rick C.
Reply by Piotr Wyderski February 24, 20192019-02-24
gnuarm.deletethisbit@gmail.com wrote:

> Is that your only criterion?
Well, basiclly, yes, it is the only degree of freedom. In other words: I can design any filtering structure that satisfies my requirements from the signal processing point of view, but not all structures are equally welcome by the FPGA, let alone an FPGA with DSP slices. Hence my question. I've already done it with a multistage CIC alone, but the hardware was much simpler and CIC approach was the only viable one. > Along with the 200+ DSP blocks I would expect the chip has many thousands of LUTs and FFs. Why focus on DSP block usage? One reason is to learn them, other is the ability to use a smaller chip. A DSP block is composed of two multipliers and an accumulator. The accumulator is what a CIC needs. There will be plenty of other functions occupying that FFs. Best regards, Piotr
Reply by February 24, 20192019-02-24
On Saturday, February 23, 2019 at 11:17:28 AM UTC-5, Piotr Wyderski wrote:
> gnuarm.deletethisbit@gmail.com wrote: > > > To determine the "right" approach, you need to define "right" in some engineering terms. So what aspects of the design and implementation are important to your goals? > > Minimisation of resource usage, or in other words, a decimation > technique that maps best onto the underlying primitives. I believe > those 200+ DSP (multiply-accumulate) blocks are good for something... > > Best regards, Piotr
Is that your only criterion? Along with the 200+ DSP blocks I would expect the chip has many thousands of LUTs and FFs. Why focus on DSP block usage? I don't see a problem of using the CIC decimators if they otherwise work the way you want. A CIC filter had sharp nulls a particular points but doesn't do so much elsewhere while being very logic and energy efficient. They are typically finished by a relatively short FIR so the aggregate delay is not so large. Doing it all in a single filter would create a much longer delay, no? Other than the power usage of a large decimating FIR filter, I can't think of other trade offs. Rick C.
Reply by Kevin Neilson February 23, 20192019-02-23
First of all, since your sample rates are pretty low, I'd see if it's possible to use a DSP chip instead of an FPGA.  Everything is easier in software.

Everything depends on your specs, which you have not stated.  Namely:  what is the attenuation of the stopband, and what is the slope between the passband and the stopband?  You say there is not much in the upper frequencies, so this makes it sound like your filtering requirements are very low.  If there is nothing much at all up there, you don't even need to filter.  Just decimate.  Take every nth sample.

The point of the CIC is to reduce the need for multipliers, but you have plenty of multipliers and low sample rates.  The CIC has big sidelobes.  It might be better to do a cascade of FIRs each with low numbers of taps.