Hi All! I have a project that use Altera Stratix II 2S180 as ASIC prototype. Because the ASIC has too many interface therefor too many clk and some of the clk does not route to fpga's dedicated clk pin ,for eg, pci clk does route to an normal I/O pin . Because the fpga and the board expensive,the BOSS does not want to make a new board. After I read throught 2S180's datasheet and throught a lot ,I found this is a very hard problem because : 1 ) Global buffer tree's delay is very long , about 5ns. 2 ) From PAD to core , normal I/O has about 1ns's delay, 3 ) I can't use PLL to compensate I/O delay or global buffer delay since PLL's input must be a clk input pin or a global buffer. 4) Inserting LCELL into datapath of input signal will make my Tco bad. How can I deal with this ? Is altera here ?
CLK input DOES NOT use clk pin ( Altera Stratix II)
Started by ●November 20, 2005
Reply by ●November 20, 20052005-11-20
It sounds like you just hit the classic ASIC to FPGA conversion problem of too many clocks. We have done a lot of this kind of work and generally it is best to plan FPGA use into the the IP from the start to make the conversion path easy. One thing to do to try and do is obviously to try and reduce the numbers of clocks. Often ASIC designs will use gated clocks because it makes for smaller logic than having local clock enabled flip-flops. Often this does create designs with large numbers of clocks which does not sit well with most FPGA fabrics. Xilinx do have some tool support for locally routed clocks to cover this situation but I am not sure if Altera can offer this facility as yet. Consider if you can alter your IP to use clock enables instead of a generated gated clock/s. Alternative if you board has multiple FPGAs look at partitioning to minimise the numbers of clock or to improve the distribution against your FPGA resources available. Often using a multiple FPGA platform is superior to using a single large FPGA based platform for ASIC prototyping. John Adair Enterpoint Ltd. - Home of Broaddown1. The ASIC Prototyping Platform. http://www.enterpoint.co.uk "huangjie" <huangjielg@gmail.com> wrote in message news:1132486415.614548.139310@g49g2000cwa.googlegroups.com...> Hi All! > > I have a project that use Altera Stratix II 2S180 as ASIC prototype. > Because the ASIC > has too many interface therefor too many clk and some of the clk does > not route to > fpga's dedicated clk pin ,for eg, pci clk does route to an normal I/O > pin . > > Because the fpga and the board expensive,the BOSS does not want to make > a new board. > After I read throught 2S180's datasheet and throught a lot ,I found > this is a very hard problem because : > 1 ) Global buffer tree's delay is very long , about 5ns. > 2 ) From PAD to core , normal I/O has about 1ns's delay, > 3 ) I can't use PLL to compensate I/O delay or global buffer delay > since PLL's input must > be a clk input pin or a global buffer. > 4) Inserting LCELL into datapath of input signal will make my Tco > bad. > > How can I deal with this ? Is altera here ? >
Reply by ●November 20, 20052005-11-20
Thank you for your replay ! But the board is built before I enter the company and the BOSS does not want to make a new board. The ASIC has too many clock just because tt has too many interface but not gated clock.
Reply by ●November 20, 20052005-11-20
huangjie wrote:> Because the ASIC has too many interface therefor too many clk and some of > the clk does not route to fpga's dedicated clk pin ,for eg, pci clk does route to an > normal I/O pin.How fast are the clocks that are not on the dedicated clock pins? If they are slow enough, you can sample them with a faster clock to generate an enable signal on the edge you want, and run your internal logic on the faster clock using that enable. The code would be different for your FPGA vs. your ASIC though: FPGA: process (fastclk) begin if RISING_EDGE(fastclk) then if enable = '1' then ... ASIC: process (pinclk) begin if RISING_EDGE(pinclk) then ... (It's for situations like this that I wish VHDL had a pre-processor like C) It might be tricky at PCI speeds, but if this is a prototyping system, you may be able to slow down your PCI clock. Regards, John
Reply by ●November 20, 20052005-11-20
Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at 33MHZ. Since they are interface to other device they can't slow down.
Reply by ●November 21, 20052005-11-21
So clock everything at 125 MHz and use clock enables. Then use FIFO's or the infamous double latch to transfer between the 33MHz and 125Mhz clock domains. Simon "huangjie" <huangjielg@gmail.com> wrote in message news:1132535246.171569.78800@f14g2000cwb.googlegroups.com...> Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at > 33MHZ. > Since they are interface to other device they can't slow down. >
Reply by ●November 21, 20052005-11-21
Thanks for your suggestion ! But first ,how to use "the infamous double latch" ? second, my asic does not have only one 125M clk, instead it have 5 more , and all of them are input from external chip and have no any frequency or phase relations. Simon Peacock =E5=86=99=E9=81=93=EF=BC=9A> So clock everything at 125 MHz and use clock enables. Then use FIFO's or > the infamous double latch to transfer between the 33MHz and 125Mhz clock > domains. > > Simon > > "huangjie" <huangjielg@gmail.com> wrote in message > news:1132535246.171569.78800@f14g2000cwb.googlegroups.com... > > Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at > > 33MHZ. > > Since they are interface to other device they can't slow down. > >
Reply by ●November 22, 20052005-11-22
A/ Forget the ASIC.. Design the FPGA.. then work out how to translate that into an ASIC. The two are so totally different that if you try to design for both you will ultimately fail. B/ The double latch..... clk_transfer : process (rst, clk) is begin if (rst = reset_active_c) then tmp <= (others => '0'); data_out <= (others => '0'); elsif rising_edge(clk) then tmp1 <= data_in; data_out <= tmp1; end if; end process clk_transfer; data_out = data_in after a little delay. No doubt there will be debate to see if there should be a tmp2. I actually have a standard block called meta_data and meta_clk which get called.. meta_data is for data signals.. i.e. static lines. meta_clk converts the incoming signal to an edge which is phase aligned to meta_data. The above is similar to these two routines.. but I can't guarantee it is identical as they are at work and I haven't touched the blocks in a number of years. (So I don't remember what's inside.. just that they work). C/ See meta clock... I have an E1 card.. it has a 32.768 MHz, 2.048 MHz (E1 ref), 1.5432 MHz (T1 ref), 16.384 MHz, 4 x 2.048 MHz TX clocks and 4 x 2.048 MHz RX clocks. Only the 32.768 MHz and the two references are related... all the rest are independent... So who said you need lots of clock lines? Everything is "meta_clk" or "meta_data" up to the 16.384 MHz which is the bus timing. The 32.768 MHz is used as a stable system reference along with the E1 & T1 references. Also the 32 MHz is used to calculate the accuracy of the 4 E1 ports by a simple long duration counter. The counter is accurate to 1 ppm but the reference is good to 25 ppm. Room temperature showed about 5-10ppm clock speed error :-) So ... provided your "reference" is faster than you actually clocks, there is no problems... just treat all clocks as edge generators which translates into clock enables. Simon "huangjie" <huangjielg@gmail.com> wrote in message news:1132568034.262632.53520@z14g2000cwz.googlegroups.com... Thanks for your suggestion ! But first ,how to use "the infamous double latch" ? second, my asic does not have only one 125M clk, instead it have 5 more , and all of them are input from external chip and have no any frequency or phase relations. Simon Peacock ??:> So clock everything at 125 MHz and use clock enables. Then use FIFO's or > the infamous double latch to transfer between the 33MHz and 125Mhz clock > domains. > > Simon > > "huangjie" <huangjielg@gmail.com> wrote in message > news:1132535246.171569.78800@f14g2000cwb.googlegroups.com... > > Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at > > 33MHZ. > > Since they are interface to other device they can't slow down. > >
Reply by ●November 22, 20052005-11-22
I have understood your idea, and know why yours work but mine cann't . Just because your slow clock is slow ,and mine is very fast. How can I deal with 125M clocks just as it is 2M ? How fast my "reference" for 125M ? Perhaps I can use a group of some phase-shift clocks to get a clk enable signals. Thank you again!
Reply by ●November 23, 20052005-11-23
There are several possible solutions. 1. Stratix II clocks don't have to come from dedicated clock inputs to reach the global clock networks. The dedicated clock inputs can reach the global clock networks without using any regular routing, so they result in less clock delay to your registers, and that is useful if you need a fast Tco to another chip. However, any I/O can reach dedicated global clock networks by using regular routing to get to the global network drive point. A clock constructed this way will have extra delay to reach each register, but the skew within the clock domain will still be fine. This will happen automatically when you compile in Quartus II -- no need to do anything. If you have 16 or fewer clocks, you are done. 33 MHz PCI has a loose enough Tco that you should comfortably meet it even with the larger clock delay that results from not using a dedicated clock pin. 2. Quartus II only promotes non-PLL clocks to "chip-wide global networks" by default. There are 16 of these. If you have more than 16 clocks in your design, you probably want to use the 32 regional (1/4 chip) global networks as well. You can tell Quartus II to put a clock on a regional network by using the assignment editor to make a "global signal = regional clock" assignment to the clock signal. Since regional clocks can only reach 1/4 of the chip, you should make these assignments carefully -- ensure that all fanouts of the clock can be placed in the quadrant of the chip near the I/O driving the clock. Generally you should use up all 16 chip-wide global clocks first, and then use the regional clocks for the lower fanout clocks, or clocks that need faster Tco on registers driving output I/Os (regional clocks have lower delay). If you have a clock that fits in 1/2 the chip, but not in 1/4 of the chip, use "global signal = dual regional clock" to combine two regional clock networks into one 1/2 chip-wide network for that clock signal. This burns two of your 32 regional clocks though. 3. You can use locally routed clocks. Such clocks use general routing, and have higher skew than the dedicated (chip-wide global or regional) clock networks. However, they have low delay if the clock fanout is low, and hence can be good for Tco to an output I/O. To minimize the skew on such networks, you should make the assignment: "maximum clock arrival skew = 0" to the clock signal. This will tell the fitter to optimize this signal for low-skew. The skew we achieve is generally quite reasonable on such clocks (~300 - 600 ps, with higher fanout clocks near the upper end of the range), but it still isn't as good as that of a global clock. Hence I'd recommend the global clock approaches (#1 and #2) first. If you need more than 48 clocks (a lot!) use this technique to make low-skew locally routed clocks for the lowest fanout clocks. 4. You could redesign your circuit to use fewer clocks, as other posters have suggested, but I suspect from your description that that is not necessary, and Stratix II in fact has plenty of clocks for what you need. Regards, Vaughn Betz Altera [v b e t z (at) altera.com] "huangjie" <huangjielg@gmail.com> wrote in message news:1132486415.614548.139310@g49g2000cwa.googlegroups.com...> Hi All! > > I have a project that use Altera Stratix II 2S180 as ASIC prototype. > Because the ASIC > has too many interface therefor too many clk and some of the clk does > not route to > fpga's dedicated clk pin ,for eg, pci clk does route to an normal I/O > pin . > > Because the fpga and the board expensive,the BOSS does not want to make > a new board. > After I read throught 2S180's datasheet and throught a lot ,I found > this is a very hard problem because : > 1 ) Global buffer tree's delay is very long , about 5ns. > 2 ) From PAD to core , normal I/O has about 1ns's delay, > 3 ) I can't use PLL to compensate I/O delay or global buffer delay > since PLL's input must > be a clk input pin or a global buffer. > 4) Inserting LCELL into datapath of input signal will make my Tco > bad. > > How can I deal with this ? Is altera here ? >





