is it possible to implement a serial in , parellel out shift register from xilinx distributed ram? any guidance is appreciated.
shift register with distributed ram
Started by ●March 24, 2007
Reply by ●March 24, 20072007-03-24
On Mar 24, 1:26 am, "CMOS" <manu...@millenniumit.com> wrote:> is it possible to implement a serial in , parellel out shift register > from xilinx distributed ram? any guidance is appreciated.I remember seeing an app note on the Xilinx web site dealing with using the JTAG port to initialize BRAMS that showed how to get two bits per lut plus one more from the FF. I think it was an app note by Ken Chapman showing how to initialize the BRAM in a PicoBlaze. A quick search of the Xilinx web site mentions a program called JTAG_loader that I think uses this technique. I will leave the rest of the searching up to you, unless maybe some one with near perfect recall just happens to remember the app note I am talking about. Regards, John McCaskill www.fastertechnology.com
Reply by ●March 24, 20072007-03-24
On Mar 24, 12:26 am, "CMOS" <manu...@millenniumit.com> wrote:> is it possible to implement a serial in , parellel out shift register > from xilinx distributed ram? any guidance is appreciated.The LUT-based distributed RAM in Xilinx FPGAs can be used as a shift register, called SRL16 or SRL32, with a length (depth) that is dynamically adjustable by the address inputs. But since the LUT has only one output (2 in Virtex-5) you cannot use a LUT as serial-to- parallel converter. Peter Alfke, Xilinx Applications
Reply by ●March 24, 20072007-03-24
CMOS wrote:> is it possible to implement a serial in , parellel out shift register > from xilinx distributed ram? any guidance is appreciated.Think of how a serial-in, parallel-out shift register is put together. There is series of shift elements that shift the data in with a broadside dump of all the shift registers into output holding registers. If you implement an n-bit serial-in, parallel out shift register where the most recently shifted in bit is present on the output, you'll need 2n registers. If you want the top n bits of an m-bit shift register where the most recently shifted bit is m-n bits from the parallel-out data, you can use 2(n-1) independent registers and int((m-n+15)/16) shift registers where the last distributed memory shift register also uses the embedded register for output. While the tools would not synthesize the stages, you could instantiate an SRLC16E element with an output mux address of 0 to accompany the output registers to get an n-bit serial-in, parallel-out shift register in n LUTs with n embedded output registers. But since registers are plentiful in the Xilinx series (heck, Lattice even tossed out 25% of the rigisters in their low-cost family in recognition of this fact) it's probably much better to used the registers (implemented with direct inputs) and leave the LUTs for use for other combinatorial logic. The register-only approach also eliminates the problems with whether to reset or not since the distributed RAM shift registers 1) cannot take a reset themselves and 2) make the use of a reset for the embedded output register almost impossible. - John_H
Reply by ●March 24, 20072007-03-24
On Mar 24, 9:57 am, "Peter Alfke" <a...@sbcglobal.net> wrote:> On Mar 24, 12:26 am, "CMOS" <manu...@millenniumit.com> wrote: > > > is it possible to implement a serial in , parellel out shift register > > from xilinx distributed ram? any guidance is appreciated. > > The LUT-based distributed RAM in Xilinx FPGAs can be used as a shift > register, called SRL16 or SRL32, with a length (depth) that is > dynamically adjustable by the address inputs. But since the LUT has > only one output (2 in Virtex-5) you cannot use a LUT as serial-to- > parallel converter. > Peter Alfke, Xilinx ApplicationsThe SLR16s have two outputs that can be used for serial to parallel shifters, the Q15 for cascading, and the selectable output. I finally found the article that describes how to use an SRL plus a FF to build a 6 bit per slice serial to parallel shifter. It is by Kris Chaplin, not Ken Chapman like I had been thinking. The article is a TechXclusives "Reconfiguring Block RAMs" at: http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iLanguageID=1&category=&sGlobalNavPick=&sSecondaryNavPick=&multPartNum=2&sTechX_ID=krs_blockRAM or: http://tinyurl.com/2fyll8 Regards, John McCaskill www.fastertechnology.com
Reply by ●March 25, 20072007-03-25
"John McCaskill" <junkmail@fastertechnology.com> wrote in message news:1174756909.000771.39330@e1g2000hsg.googlegroups.com...> The SLR16s have two outputs that can be used for serial to parallel > shifters, the Q15 for cascading, and the selectable output. I finally > found the article that describes how to use an SRL plus a FF to build > a 6 bit per slice serial to parallel shifter. It is by Kris Chaplin, > not Ken Chapman like I had been thinking.Slightly in another direction...is there a trick to setting up the cascades on the SRL16s to maintain a consistent delay? We strung 8 in a row to get an adjustable 1-bit delay line. It works, but there's a bunch of extra muxes, etc. to get the delay consistent (3 clocks plus whatever tap I pull as output). I'm actually the systems guy and not the VHDL coder (and communication of requirements is always tricky when you're doing something new), but I'm always interested in the mechanics to see if it can be done better (smaller) while still meeting requirements. Especially since I'd really like a 1024 tap delay but I ran out of space (I need tens of these, plus other DSP goodies). Suggestions on other mechanisms to use are also welcome. Dr. Marty Ryba martin (dot) ryba (at) verizon (dot) net (man, I hate sp*m)
Reply by ●March 26, 20072007-03-26
Marty Ryba wrote:> "John McCaskill" <junkmail@fastertechnology.com> wrote in message > news:1174756909.000771.39330@e1g2000hsg.googlegroups.com... >> The SLR16s have two outputs that can be used for serial to parallel >> shifters, the Q15 for cascading, and the selectable output. I finally >> found the article that describes how to use an SRL plus a FF to build >> a 6 bit per slice serial to parallel shifter. It is by Kris Chaplin, >> not Ken Chapman like I had been thinking. > > Slightly in another direction...is there a trick to setting up the cascades > on the SRL16s to maintain a consistent delay? We strung 8 in a row to get an > adjustable 1-bit delay line. It works, but there's a bunch of extra muxes, > etc. to get the delay consistent (3 clocks plus whatever tap I pull as > output). I'm actually the systems guy and not the VHDL coder (and > communication of requirements is always tricky when you're doing something > new), but I'm always interested in the mechanics to see if it can be done > better (smaller) while still meeting requirements. Especially since I'd > really like a 1024 tap delay but I ran out of space (I need tens of these, > plus other DSP goodies). Suggestions on other mechanisms to use are also > welcome. > > Dr. Marty Ryba > martin (dot) ryba (at) verizon (dot) net (man, I hate sp*m)Shift registers are clocked. Clocked elements don't have routing consistency issues, they have routing maximum issues. I'd suggest using some Xilinx routing for combinatorial delays in an *extremely* well controlled situation, inverting consecutive stages of a multi-tap delay to reduce pulse width distortion. But a 1024 element delay line?! It sounds like you need a nice, clocked delay. SRLs in series shouldn't have delay issues. Is it that you're taking the output from a very long clocked shift register? If so, just clock the muxed outputs to get all the SRLs to show up at the output pin at a predictable time. Often the conceptual problem with unclocked delay lines is figuring out how to get a consistent input path or a consistent output path; the trouble is, both are needed. What is your desired range and resolution? Acceptable jitter?
Reply by ●March 26, 20072007-03-26
Unclocked delay lines are really not stable over temperature and voltage, although there are "servo' tricks to stabilize them (as done in the IDELAY and ODELAY Virtex I/O functions) For a very long clocked delay line, it might make sense to use a dual- ported BlockRAM. "Waste is often only in the eyes of the beholder..." Peter Alfke, Xilinx On Mar 25, 8:35 pm, John_H <newsgr...@johnhandwork.com> wrote:> Marty Ryba wrote: > > "John McCaskill" <junkm...@fastertechnology.com> wrote in message > >news:1174756909.000771.39330@e1g2000hsg.googlegroups.com... > >> The SLR16s have two outputs that can be used for serial to parallel > >> shifters, the Q15 for cascading, and the selectable output. I finally > >> found the article that describes how to use an SRL plus a FF to build > >> a 6 bit per slice serial to parallel shifter. It is by Kris Chaplin, > >> not Ken Chapman like I had been thinking. > > > Slightly in another direction...is there a trick to setting up the cascades > > on the SRL16s to maintain a consistent delay? We strung 8 in a row to get an > > adjustable 1-bit delay line. It works, but there's a bunch of extra muxes, > > etc. to get the delay consistent (3 clocks plus whatever tap I pull as > > output). I'm actually the systems guy and not the VHDL coder (and > > communication of requirements is always tricky when you're doing something > > new), but I'm always interested in the mechanics to see if it can be done > > better (smaller) while still meeting requirements. Especially since I'd > > really like a 1024 tap delay but I ran out of space (I need tens of these, > > plus other DSP goodies). Suggestions on other mechanisms to use are also > > welcome. > > > Dr. Marty Ryba > > martin (dot) ryba (at) verizon (dot) net (man, I hate sp*m) > > Shift registers are clocked. Clocked elements don't have routing > consistency issues, they have routing maximum issues. I'd suggest using > some Xilinx routing for combinatorial delays in an *extremely* well > controlled situation, inverting consecutive stages of a multi-tap delay > to reduce pulse width distortion. But a 1024 element delay line?! It > sounds like you need a nice, clocked delay. SRLs in series shouldn't > have delay issues. > > Is it that you're taking the output from a very long clocked shift > register? If so, just clock the muxed outputs to get all the SRLs to > show up at the output pin at a predictable time. > > Often the conceptual problem with unclocked delay lines is figuring out > how to get a consistent input path or a consistent output path; the > trouble is, both are needed. > > What is your desired range and resolution? Acceptable jitter?
Reply by ●March 27, 20072007-03-27
Thanks for the suggestions. Just to clarify, it is a clocked delay that I want; everything is on a clock, which functions as a sample counter as well. Say there is a bitstream, and I want two copies of it: one "prompt" copy and one "delayed" copy with the delay being a variable number of samples in some kind of buffer. This is easy with RAM in a GPP (pointer arithmetic), but a GPP is not fast enough for pipelined processing (~30 Mbps on each of 10 or more bit streams). During routine processing, the delay is fixed and I want on each clock the pipeline to shift by one. Now, on subsequent processing cycles, *maintaining the state of the pipeline* I may want to tap a different delay point. Now, since I want more than 16 taps of delays, I see two approaches (let N be the total delay): 1) The Q15's are connected for cascading, the output of the (N/16)+1 SRL is set to the remainder of the delay, and a mux selects the output pin of the (N/16)+1 SRL to use as output of the block. Based on the delay value, the timing of the appearance of the correct sample depends on how many SRL's it traverses before it exits. So, delay needs to inserted to make it constant. I'm likely glossing over details I don't quite understand since I didn't code it myself, and most of this design was done 2 years ago. This is how I believe it's implemented right now. 2) The output of each SRL is connected to the next one's input. The first (N/16) of the SRL's addresses would be set to 15 (max delay), the "middle" one would have mod(N,16), and the others would all be set to zero. The block's output is connected to the output of the last SRL. This I think would give a consistent delay of about N+(# of SRL's). The problem is that it would enforce a minimum delay that I would likely have to insert into my "prompt" channel to balance things back out. 3) Other ideas?? For instance, I actually would prefer to be able to create tens of "fingers" of delay without needing separate parallel pipelines but maybe by having them cascade into each other. Any app notes out there that I haven't dug up yet? I'm revisiting this since we're restarting the program and have the opportunity to revamp parts of the design. Marty Ryba semi-mad scientist proud member of the Luxuriant Flowing Hair Club for Scientists (no kidding!) "Peter Alfke" <alfke@sbcglobal.net> wrote in message news:1174881568.648560.274310@n59g2000hsh.googlegroups.com...> Unclocked delay lines are really not stable over temperature and > voltage, although there are "servo' tricks to stabilize them (as done > in the IDELAY and ODELAY Virtex I/O functions) > For a very long clocked delay line, it might make sense to use a dual- > ported BlockRAM. > "Waste is often only in the eyes of the beholder..." > Peter Alfke, Xilinx > On Mar 25, 8:35 pm, John_H <newsgr...@johnhandwork.com> wrote: >> Marty Ryba wrote: >> > Slightly in another direction...is there a trick to setting up the >> > cascades >> > on the SRL16s to maintain a consistent delay? We strung 8 in a row to >> > get an >> > adjustable 1-bit delay line. It works, but there's a bunch of extra >> > muxes, >> > etc. to get the delay consistent (3 clocks plus whatever tap I pull as >> > output). I'm actually the systems guy and not the VHDL coder (and >> > communication of requirements is always tricky when you're doing >> > something >> > new), but I'm always interested in the mechanics to see if it can be >> > done >> > better (smaller) while still meeting requirements. Especially since I'd >> > really like a 1024 tap delay but I ran out of space (I need tens of >> > these, >> > plus other DSP goodies). Suggestions on other mechanisms to use are >> > also >> > welcome.>> Shift registers are clocked. Clocked elements don't have routing >> consistency issues, they have routing maximum issues. I'd suggest using >> some Xilinx routing for combinatorial delays in an *extremely* well >> controlled situation, inverting consecutive stages of a multi-tap delay >> to reduce pulse width distortion. But a 1024 element delay line?! It >> sounds like you need a nice, clocked delay. SRLs in series shouldn't >> have delay issues. >> >> Is it that you're taking the output from a very long clocked shift >> register? If so, just clock the muxed outputs to get all the SRLs to >> show up at the output pin at a predictable time. >> >> Often the conceptual problem with unclocked delay lines is figuring out >> how to get a consistent input path or a consistent output path; the >> trouble is, both are needed. >> >> What is your desired range and resolution? Acceptable jitter? > >
Reply by ●March 27, 20072007-03-27
On Mar 26, 8:09 pm, "Marty Ryba" <martin.ryba.nos...@verizon.net> wrote:> Thanks for the suggestions. Just to clarify, it is a clocked delay that I > want; everything is on a clock, which functions as a sample counter as well. > Say there is a bitstream, and I want two copies of it: one "prompt" copy and > one "delayed" copy with the delay being a variable number of samples in some > kind of buffer. This is easy with RAM in a GPP (pointer arithmetic), but a > GPP is not fast enough for pipelined processing (~30 Mbps on each of 10 or > more bit streams).Marty, forget the GPP and just wrap two counters around a BlockRAM. That can run ten times fater than you need it. Maybe you can then do some time-division multiplexing to save on BlockRAMs...just an idea... Peter Alfke, from home During routine processing, the delay is fixed and I want> on each clock the pipeline to shift by one. Now, on subsequent processing > cycles, *maintaining the state of the pipeline* I may want to tap a > different delay point. > > Now, since I want more than 16 taps of delays, I see two approaches (let N > be the total delay): > > 1) The Q15's are connected for cascading, the output of the (N/16)+1 SRL is > set to the remainder of the delay, and a mux selects the output pin of the > (N/16)+1 SRL to use as output of the block. Based on the delay value, the > timing of the appearance of the correct sample depends on how many SRL's it > traverses before it exits. So, delay needs to inserted to make it constant. > I'm likely glossing over details I don't quite understand since I didn't > code it myself, and most of this design was done 2 years ago. This is how I > believe it's implemented right now. > > 2) The output of each SRL is connected to the next one's input. The first > (N/16) of the SRL's addresses would be set to 15 (max delay), the "middle" > one would have mod(N,16), and the others would all be set to zero. The > block's output is connected to the output of the last SRL. This I think > would give a consistent delay of about N+(# of SRL's). The problem is that > it would enforce a minimum delay that I would likely have to insert into my > "prompt" channel to balance things back out. > > 3) Other ideas?? For instance, I actually would prefer to be able to create > tens of "fingers" of delay without needing separate parallel pipelines but > maybe by having them cascade into each other. Any app notes out there that I > haven't dug up yet? I'm revisiting this since we're restarting the program > and have the opportunity to revamp parts of the design. > > Marty Ryba > semi-mad scientist > proud member of the Luxuriant Flowing Hair Club for Scientists (no kidding!) > > "Peter Alfke" <a...@sbcglobal.net> wrote in message > > news:1174881568.648560.274310@n59g2000hsh.googlegroups.com... > > > Unclocked delay lines are really not stable over temperature and > > voltage, although there are "servo' tricks to stabilize them (as done > > in the IDELAY and ODELAY Virtex I/O functions) > > For a very long clocked delay line, it might make sense to use a dual- > > ported BlockRAM. > > "Waste is often only in the eyes of the beholder..." > > Peter Alfke, Xilinx > > On Mar 25, 8:35 pm, John_H <newsgr...@johnhandwork.com> wrote: > >> Marty Ryba wrote: > >> > Slightly in another direction...is there a trick to setting up the > >> > cascades > >> > on the SRL16s to maintain a consistent delay? We strung 8 in a row to > >> > get an > >> > adjustable 1-bit delay line. It works, but there's a bunch of extra > >> > muxes, > >> > etc. to get the delay consistent (3 clocks plus whatever tap I pull as > >> > output). I'm actually the systems guy and not the VHDL coder (and > >> > communication of requirements is always tricky when you're doing > >> > something > >> > new), but I'm always interested in the mechanics to see if it can be > >> > done > >> > better (smaller) while still meeting requirements. Especially since I'd > >> > really like a 1024 tap delay but I ran out of space (I need tens of > >> > these, > >> > plus other DSP goodies). Suggestions on other mechanisms to use are > >> > also > >> > welcome. > >> Shift registers are clocked. Clocked elements don't have routing > >> consistency issues, they have routing maximum issues. I'd suggest using > >> some Xilinx routing for combinatorial delays in an *extremely* well > >> controlled situation, inverting consecutive stages of a multi-tap delay > >> to reduce pulse width distortion. But a 1024 element delay line?! It > >> sounds like you need a nice, clocked delay. SRLs in series shouldn't > >> have delay issues. > > >> Is it that you're taking the output from a very long clocked shift > >> register? If so, just clock the muxed outputs to get all the SRLs to > >> show up at the output pin at a predictable time. > > >> Often the conceptual problem with unclocked delay lines is figuring out > >> how to get a consistent input path or a consistent output path; the > >> trouble is, both are needed. > > >> What is your desired range and resolution? Acceptable jitter?





