FPGARelated.com
Forums

CLK input DOES NOT use clk pin ( Altera Stratix II)

Started by huangjie November 20, 2005
Hi All!

I have a project that use Altera Stratix II 2S180  as ASIC prototype.
Because  the ASIC
has too many interface therefor too many  clk and some of the clk does
not route to
fpga's dedicated clk pin ,for eg, pci clk does  route to an normal I/O
pin .

Because the fpga and the board expensive,the BOSS does not want to make
a new board.
After I read throught 2S180's  datasheet and throught a lot ,I found
this is a very hard problem because :
 1 )  Global buffer tree's delay is very long , about 5ns.
 2 )  From PAD to core , normal I/O has about 1ns's delay,
 3 )  I can't use PLL to compensate I/O delay or global buffer delay
since PLL's input must
      be a clk input pin or  a global buffer.
 4)   Inserting LCELL into datapath of input signal  will make my Tco
bad.

How can I deal with this ? Is altera here ?

It sounds like you just hit the classic ASIC to FPGA conversion problem of 
too many clocks. We have done a lot of this kind of work and generally it is 
best to plan FPGA use into the the IP from the start to make the conversion 
path easy.

One thing to do to try and do is obviously to try and reduce the numbers of 
clocks. Often ASIC designs will use gated clocks because it makes for 
smaller logic than having local clock enabled flip-flops. Often this does 
create designs with large numbers of clocks which does not sit well with 
most FPGA fabrics. Xilinx do have some tool support for locally routed 
clocks to cover this situation but I am not sure if Altera can offer this 
facility as yet.

Consider if you can alter your IP to use clock enables instead of a 
generated gated clock/s. Alternative if you board has multiple FPGAs look at 
partitioning to minimise the numbers of clock or to improve the distribution 
against your FPGA resources available. Often using a multiple FPGA platform 
is superior to using a single large FPGA based platform for ASIC 
prototyping.

John Adair
Enterpoint Ltd. - Home of Broaddown1. The ASIC Prototyping Platform.
http://www.enterpoint.co.uk



"huangjie" <huangjielg@gmail.com> wrote in message 
news:1132486415.614548.139310@g49g2000cwa.googlegroups.com...
> Hi All! > > I have a project that use Altera Stratix II 2S180 as ASIC prototype. > Because the ASIC > has too many interface therefor too many clk and some of the clk does > not route to > fpga's dedicated clk pin ,for eg, pci clk does route to an normal I/O > pin . > > Because the fpga and the board expensive,the BOSS does not want to make > a new board. > After I read throught 2S180's datasheet and throught a lot ,I found > this is a very hard problem because : > 1 ) Global buffer tree's delay is very long , about 5ns. > 2 ) From PAD to core , normal I/O has about 1ns's delay, > 3 ) I can't use PLL to compensate I/O delay or global buffer delay > since PLL's input must > be a clk input pin or a global buffer. > 4) Inserting LCELL into datapath of input signal will make my Tco > bad. > > How can I deal with this ? Is altera here ? >
Thank you for your replay !
But the board is built before I enter the company and the BOSS does not
want to make a new board.

The ASIC has too many clock just because tt has too many interface but
not gated clock.

huangjie wrote:
> Because the ASIC has too many interface therefor too many clk and some of > the clk does not route to fpga's dedicated clk pin ,for eg, pci clk does route to an > normal I/O pin.
How fast are the clocks that are not on the dedicated clock pins? If they are slow enough, you can sample them with a faster clock to generate an enable signal on the edge you want, and run your internal logic on the faster clock using that enable. The code would be different for your FPGA vs. your ASIC though: FPGA: process (fastclk) begin if RISING_EDGE(fastclk) then if enable = '1' then ... ASIC: process (pinclk) begin if RISING_EDGE(pinclk) then ... (It's for situations like this that I wish VHDL had a pre-processor like C) It might be tricky at PCI speeds, but if this is a prototyping system, you may be able to slow down your PCI clock. Regards, John
Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at
33MHZ.
Since they are interface to other device they can't slow down.

So clock everything at 125 MHz and use clock enables.  Then use FIFO's or
the infamous double latch to transfer between the 33MHz and 125Mhz clock
domains.

Simon

"huangjie" <huangjielg@gmail.com> wrote in message
news:1132535246.171569.78800@f14g2000cwb.googlegroups.com...
> Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at > 33MHZ. > Since they are interface to other device they can't slow down. >
Thanks for your suggestion !
But first ,how to use "the infamous double latch" ?
 second, my asic does not have only one 125M clk, instead it have 5
more ,
and all of them are input from external chip and have no any frequency
or phase
relations.

Simon Peacock =E5=86=99=E9=81=93=EF=BC=9A

> So clock everything at 125 MHz and use clock enables. Then use FIFO's or > the infamous double latch to transfer between the 33MHz and 125Mhz clock > domains. > > Simon > > "huangjie" <huangjielg@gmail.com> wrote in message > news:1132535246.171569.78800@f14g2000cwb.googlegroups.com... > > Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at > > 33MHZ. > > Since they are interface to other device they can't slow down. > >
A/  Forget the ASIC.. Design the FPGA.. then work out how to translate that
into an ASIC.   The two are so totally different that if you try to design
for both you will ultimately fail.

B/ The double latch.....

clk_transfer : process (rst, clk) is
begin
   if (rst = reset_active_c) then
      tmp      <= (others => '0');
      data_out <= (others => '0');
   elsif rising_edge(clk) then
      tmp1     <= data_in;
      data_out <= tmp1;
   end if;
end process clk_transfer;

data_out = data_in after a little delay.
No doubt there will be debate to see if there should be a tmp2.  I actually
have a standard block called meta_data and meta_clk which get called..
meta_data is for data signals.. i.e. static lines.  meta_clk converts the
incoming signal to an edge which is phase aligned to meta_data.
The above is similar to these two routines.. but I can't guarantee it is
identical as they are at work and I haven't touched the blocks in a number
of years.  (So I don't remember what's inside.. just that they work).

C/  See meta clock... I have an E1 card.. it has a 32.768 MHz, 2.048 MHz (E1
ref), 1.5432 MHz (T1 ref), 16.384 MHz, 4 x 2.048 MHz TX clocks and 4 x 2.048
MHz RX clocks.  Only the 32.768 MHz and the two references are related...
all the rest are independent... So who said you need lots of clock lines?

Everything is "meta_clk" or "meta_data" up to the 16.384 MHz which is the
bus timing.  The 32.768 MHz is used as a stable system reference along with
the E1 & T1 references.   Also the 32 MHz is used to calculate the accuracy
of the 4 E1 ports by a simple long duration counter.  The counter is
accurate to 1 ppm but the reference is good to 25 ppm.  Room temperature
showed about 5-10ppm clock speed error :-)

So ... provided your "reference" is faster than you actually clocks, there
is no problems... just treat all clocks as edge generators which translates
into clock enables.

Simon


"huangjie" <huangjielg@gmail.com> wrote in message
news:1132568034.262632.53520@z14g2000cwz.googlegroups.com...
Thanks for your suggestion !
But first ,how to use "the infamous double latch" ?
 second, my asic does not have only one 125M clk, instead it have 5
more ,
and all of them are input from external chip and have no any frequency
or phase
relations.

Simon Peacock ??:

> So clock everything at 125 MHz and use clock enables. Then use FIFO's or > the infamous double latch to transfer between the 33MHz and 125Mhz clock > domains. > > Simon > > "huangjie" <huangjielg@gmail.com> wrote in message > news:1132535246.171569.78800@f14g2000cwb.googlegroups.com... > > Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at > > 33MHZ. > > Since they are interface to other device they can't slow down. > >
I have understood your idea, and know why yours work but mine cann't .
Just because your slow clock is  slow ,and  mine is very fast.
How can I deal with 125M clocks  just as it is 2M ? How fast my
"reference"  for 125M ?
Perhaps I can use a group of some phase-shift clocks to get a clk
enable signals.
Thank you again!

There are several possible solutions.

1.  Stratix II clocks don't have to come from dedicated clock inputs to 
reach the global clock networks.  The dedicated clock inputs can reach the 
global clock networks without using any regular routing, so they result in 
less clock delay to your registers, and that is useful if you need a fast 
Tco to another chip.  However, any I/O can reach dedicated global clock 
networks by using regular routing to get to the global network drive point. 
A clock constructed this way will have extra delay to reach each register, 
but the skew within the clock domain will still be fine.

This will happen automatically when you compile in Quartus II -- no need to 
do anything.

If you have 16 or fewer clocks, you are done.  33 MHz PCI has a loose enough 
Tco that you should comfortably meet it even with the larger clock delay 
that results from not using a dedicated clock pin.


2.  Quartus II only promotes non-PLL clocks to "chip-wide global networks" 
by default.  There are 16 of these.  If you have more than 16 clocks in your 
design, you probably want to use the 32 regional (1/4 chip) global networks 
as well. You can tell Quartus II to put a clock on a regional network by 
using the assignment editor to make a

"global signal = regional clock"

assignment to the clock signal.  Since regional clocks can only reach 1/4 of 
the chip, you should make these assignments carefully -- ensure that all 
fanouts of the clock can be placed in the quadrant of the chip near the I/O 
driving the clock.  Generally you should use up all 16 chip-wide global 
clocks first, and then use the regional clocks for the lower fanout clocks, 
or clocks that need faster Tco on registers driving output I/Os (regional 
clocks have lower delay).

If you have a clock that fits in 1/2 the chip, but not in 1/4 of the chip, 
use "global signal = dual regional clock" to combine two regional clock 
networks into one 1/2 chip-wide network for that clock signal.  This burns 
two of your 32 regional clocks though.


3.  You can use locally routed clocks.  Such clocks use general routing, and 
have higher skew than the dedicated (chip-wide global or regional) clock 
networks.  However, they have low delay if the clock fanout is low, and 
hence can be good for Tco to an output I/O.  To minimize the skew on such 
networks, you should make the assignment:

"maximum clock arrival skew = 0"

to the clock signal.  This will tell the fitter to optimize this signal for 
low-skew.  The skew we achieve is generally quite reasonable on such clocks 
(~300 - 600 ps, with higher fanout clocks near the upper end of the range), 
but it still isn't as good as that of a global clock.  Hence I'd recommend 
the global clock approaches (#1 and #2) first.  If you need more than 48 
clocks (a lot!) use this technique to make low-skew locally routed clocks 
for the lowest fanout clocks.


4.  You could redesign your circuit to use fewer clocks, as other posters 
have suggested, but I suspect from your description that that is not 
necessary, and Stratix II in fact has plenty of clocks for what you need.

Regards,

Vaughn Betz
Altera
[v b e t z (at) altera.com]

"huangjie" <huangjielg@gmail.com> wrote in message 
news:1132486415.614548.139310@g49g2000cwa.googlegroups.com...
> Hi All! > > I have a project that use Altera Stratix II 2S180 as ASIC prototype. > Because the ASIC > has too many interface therefor too many clk and some of the clk does > not route to > fpga's dedicated clk pin ,for eg, pci clk does route to an normal I/O > pin . > > Because the fpga and the board expensive,the BOSS does not want to make > a new board. > After I read throught 2S180's datasheet and throught a lot ,I found > this is a very hard problem because : > 1 ) Global buffer tree's delay is very long , about 5ns. > 2 ) From PAD to core , normal I/O has about 1ns's delay, > 3 ) I can't use PLL to compensate I/O delay or global buffer delay > since PLL's input must > be a clk input pin or a global buffer. > 4) Inserting LCELL into datapath of input signal will make my Tco > bad. > > How can I deal with this ? Is altera here ? >