There are 10 messages in this thread.
You are currently looking at messages 0 to 10.
Hi, In a design I'm working on I have a machine that produces 128 bit of data. This data is destined to a 16 bit DAC running 8x faster. Knowing that I can produce the 128bit up to a 120MHz rate I wanted to generate the 8x 16bit stream as fast as I can (and I'm using a 500MHz DAC) The 128 to 16bit It's all done in a component I called front8x that selects counts and selects the 16bit slice out of the 128bit data as below (being the clock output the one used to run the rest of the circuit). In practice I found it to run up to 300MHz. Should I expect that this would be the right up limit I could do it ? Is there any clever design of this frontend to allow higher speed ? ( note: the phase of the clock out to the DAC is set on another PLL so I'm surely well by setting the DAC to sample at the middle of the eye pattern. So no issues here ) I would like very much to read some comments, please. Thanks. Luis C. (device CycloneIII-FBGA fastest grade, all outs using LVDS) -- LIBRARY ieee; USE ieee.std_logic_1164.ALL; USE ieee.std_logic_arith.ALL; ENTITY front8x IS PORT ( clkin: IN STD_LOGIC; -- master frontend clock sync: IN STD_LOGIC; -- Sync frontend clkout: OUT STD_LOGIC; -- 1/8 main clock out datain: IN STD_LOGIC_VECTOR (127 downto 0);-- system data bus dacout: OUT STD_LOGIC_VECTOR (15 downto 0)-- DAC data bus ); END front8x; ARCHITECTURE regmux8 OF front8x IS SIGNAL dacreg: STD_LOGIC_VECTOR ( 15 DOWNTO 0 ); SIGNAL datareg:STD_LOGIC_VECTOR ( 127 DOWNTO 0 ); SIGNAL cntr: INTEGER RANGE 0 TO 7; BEGIN dacout <= dacreg; -------------------------- -- Main 8:1 cycle -- clockout rise at count=4 -- bigdata is fetch at cont = last = 7 main: PROCESS(clkin,sync) BEGIN IF (clkin='1' AND clkin'EVENT) THEN IF (sync='0') THEN cntr <= 0; ELSE cntr <= cntr + 1; END IF; case cntr is when 0 => dacreg <= datareg(127 downto 112); when 1 => dacreg <= datareg(111 downto 96); when 2 => dacreg <= datareg(95 downto 80); when 3 => dacreg <= datareg(79 downto 64); when 4 => dacreg <= datareg(63 downto 48); when 5 => dacreg <= datareg(47 downto 32); when 6 => dacreg <= datareg(31 downto 16); when others => dacreg <= datareg(15 downto 0); datareg <= datain; END CASE; IF (cntr > 4) THEN clkout <= '1'; ELSE clkout <= '0'; END IF; END IF; END PROCESS main; END regmux8;______________________________
> Is there any clever design of this frontend to allow higher speed ? As you're stepping through the 128 bits to extract the 16 bit output I would have implemented it as a big shift register and take off the top/bottom bits. Then all the tools have to worry about is a single register to register delay rather than big mux required to 'select' the correct 16 bits. Nial.
On 6/14/2010 1:45 PM, LC wrote: > > Should I expect that this would be the right up limit I could do it ? > Is there any clever design of this frontend to allow higher speed ? > Does XAPP265 give you any architectural hints that you can use in your Altera part? HTH., Syms.
Symon wrote: > On 6/14/2010 1:45 PM, LC wrote: >> >> Should I expect that this would be the right up limit I could do it ? >> Is there any clever design of this frontend to allow higher speed ? >> > Does XAPP265 give you any architectural hints that you can use in your > Altera part? > HTH., Syms. Tks, Symon, Indeed there are some variations induced by this reading that I'll try. Thanks. Luis C.
Nial Stewart wrote: > > As you're stepping through the 128 bits to extract the 16 bit output > I would have implemented it as a big shift register and take off the > top/bottom bits. > > Then all the tools have to worry about is a single register to register delay > rather than big mux required to 'select' the correct 16 bits. > > > Nial. > Nial, Ok, good idea. Tks, will try. Not sure what you mean by top/bottom but I presume it is something like having the long 1 bit 128bit SR with outputs at 0, 16, 32 etc while the parallel 128bit word has bit reordering such as each shift would produce the next 16bit word to come out. If I'm missing something let me know. Thanks. Luis C.
> Ok, good idea. Tks, will try. > Not sure what you mean by top/bottom but I presume it is something like > having the long 1 bit 128bit SR with outputs at 0, 16, 32 etc while the parallel 128bit word has > bit reordering such as each shift would produce the next 16bit word to come out. If I'm missing > something let me know. Something like this, with a load value.... signal shift_reg : std_logic_vector(127 downto 0); signal output : std_logic_vector(15 downto 0); signal load_value : std_logic_vector(127 downto 0); : : : process(clk,rst) begin if(rst = '1') then shift_reg <= (others => '0'); output <= (others => '0'); elsif(rising_edge(clk)) then if(load = '1') then shift_reg <= load_value; else shift_reg(127 downto 112) <= shift_reg(111 downto 96; shift_reg(111 downto 96) <= shift_reg(95 downto 80); shift_reg(79 downto 64) <= shift_reg(63 downto 48); : : shift_reg(31 downto 16) <= shift_reg(15 downto 0); end if; output <= shift_reg(127 downto 112); end if; end process; Nial
Nial Stewart wrote: >> Ok, good idea. Tks, will try. >> Not sure what you mean by top/bottom but I presume it is something like >> having the long 1 bit 128bit SR with outputs at 0, 16, 32 etc while the parallel 128bit word has >> bit reordering such as each shift would produce the next 16bit word to come out. If I'm missing >> something let me know. > > Something like this, with a load value.... > > signal shift_reg : std_logic_vector(127 downto 0); > signal output : std_logic_vector(15 downto 0); > signal load_value : std_logic_vector(127 downto 0); > > > : > : > : > > process(clk,rst) > begin > if(rst = '1') then > shift_reg <= (others => '0'); > output <= (others => '0'); > elsif(rising_edge(clk)) then > if(load = '1') then > shift_reg <= load_value; > else > shift_reg(127 downto 112) <= shift_reg(111 downto 96; > shift_reg(111 downto 96) <= shift_reg(95 downto 80); > shift_reg(79 downto 64) <= shift_reg(63 downto 48); > : > : > shift_reg(31 downto 16) <= shift_reg(15 downto 0); > > end if; > > output <= shift_reg(127 downto 112); > > end if; > end process; > > > Nial > > Thanks for the clarification. Yes, Now I've tested both: the 1 bit SR with 128 with bit reordering on both sides (which is just messing up with the bit order must not consume precious time) And the SR in 16bit chunks approach you suggested. Both resulted identical (as expected... they are after all not too different if we think of the data path delay). Both were indeed a bit faster than my previous counter/mux approach. Now I'm closer to 400MHz... I believe that is what I could do with this technology. Again, tks, Luis C.
On 6/15/2010 12:57 PM, LC wrote: > Symon wrote: >> On 6/14/2010 1:45 PM, LC wrote: >>> >>> Should I expect that this would be the right up limit I could do it ? >>> Is there any clever design of this frontend to allow higher speed ? >>> >> Does XAPP265 give you any architectural hints that you can use in your >> Altera part? >> HTH., Syms. > > Tks, Symon, > Indeed there are some variations induced by this reading that I'll try. > Thanks. > > Luis C. Hi Luis, You might want to pay particular attention to the DDR registers in the IOBs. I expect your Altera part has the same features, but I dunno for sure. The registers mean that your internal logic can run at half the speed of the external signals. Which is nice. HTH, Syms.
On Jun 16, 9:31=A0am, Symon <symon_bre...@hotmail.com> wrote: > On 6/15/2010 12:57 PM, LC wrote: > > > Symon wrote: > >> On 6/14/2010 1:45 PM, LC wrote: > > >>> Should I expect that this would be the right up limit I could do it ? > >>> Is there any clever design of this frontend to allow higher speed ? > > >> Does XAPP265 give you any architectural hints that you can use in your > >> Altera part? > >> HTH., Syms. > > > Tks, Symon, > > Indeed there are some variations induced by this reading that I'll try. > > Thanks. > > > Luis C. > > Hi Luis, > You might want to pay particular attention to the DDR registers in the > IOBs. I expect your Altera part has the same features, but I dunno for > sure. The registers mean that your internal logic can run at half the > speed of the external signals. Which is nice. > HTH, Syms. That's what I would suggest. By using the DDR registers, the data stream can be split into odd/even words with parallel paths. Then each stream would only need to run at half the rate on the I/O pins. Since you already have the 500 MHz clock you can just divide that by two to generate two enables, one for the odd and one for the even data streams. I've never used the DDR registers. You probably want to look closely at the example code that Altera provides. Rick______________________________
rickman wrote: > On Jun 16, 9:31 am, Symon <symon_bre...@hotmail.com> wrote: >> On 6/15/2010 12:57 PM, LC wrote: >> >>> Symon wrote: >>>> On 6/14/2010 1:45 PM, LC wrote: >>>>> Should I expect that this would be the right up limit I could do it ? >>>>> Is there any clever design of this frontend to allow higher speed ? >>>> Does XAPP265 give you any architectural hints that you can use in your >>>> Altera part? >>>> HTH., Syms. >>> Tks, Symon, >>> Indeed there are some variations induced by this reading that I'll try. >>> Thanks. >>> Luis C. >> Hi Luis, >> You might want to pay particular attention to the DDR registers in the >> IOBs. I expect your Altera part has the same features, but I dunno for >> sure. The registers mean that your internal logic can run at half the >> speed of the external signals. Which is nice. >> HTH, Syms. > > That's what I would suggest. By using the DDR registers, the data > stream can be split into odd/even words with parallel paths. Then > each stream would only need to run at half the rate on the I/O pins. > Since you already have the 500 MHz clock you can just divide that by > two to generate two enables, one for the odd and one for the even data > streams. I've never used the DDR registers. You probably want to > look closely at the example code that Altera provides. > > Rick Many thaks Folks, Very good tips. tks, Luis C.