Reply by KJ September 29, 20082008-09-29
On Sep 29, 1:11=A0pm, n...@puntnl.niks (Nico Coesel) wrote:
> "KJ" <kkjenni...@sbcglobal.net> wrote: > > >"Nico Coesel" <n...@puntnl.niks> wrote in message > >news:48dfe4ee.168426254@news.planet.nl... > >> jhal...@TheWorld.com (Joseph H Allen) wrote: > > >> An easier way without the extra jitter is to use an > >> output flipflop (aka DDR flipflop) which can be clocked using 2 > >> clocks. The first clock sets the output, the second clock (inverted > >> first clock) resets the output. And presto, you'll have a clock output > >> which is (within pin-to-pin skew) perfectly synchronous to the other > >> outputs. > > >Hold time requirements (like the .4ns in the OP) will be impossible to > >guarantee with this method though. > > A slightly longer PCB track will do that for you.
Slightly? The OP estimated the PCB traces at ~1 inch. To add .4 ns of delay (the hold time requirement of the SRAM) would require adding ~2.5 inches of trace. Tommy eyeballed the existing traces at ~1 inch. To do what you suggest would require adding 2.5x of the existing trace to each and every address/control signal on something that is running ~200 MHz...not the sort of thing one would design into a board...at least not intentionally. KJ
Reply by Nico Coesel September 29, 20082008-09-29
"KJ" <kkjennings@sbcglobal.net> wrote:

> >"Nico Coesel" <nico@puntnl.niks> wrote in message >news:48dfe4ee.168426254@news.planet.nl... >> jhallen@TheWorld.com (Joseph H Allen) wrote: >> >> An easier way without the extra jitter is to use an >> output flipflop (aka DDR flipflop) which can be clocked using 2 >> clocks. The first clock sets the output, the second clock (inverted >> first clock) resets the output. And presto, you'll have a clock output >> which is (within pin-to-pin skew) perfectly synchronous to the other >> outputs. >> > >Hold time requirements (like the .4ns in the OP) will be impossible to >guarantee with this method though.
A slightly longer PCB track will do that for you. But that won't work on a pre-made board because you can't alter the PCB. Still, if the OP uses an off the shelf board the designer should have thought about these sort of things... Perhaps the OP could ask them. -- Programmeren in Almere? E-mail naar nico@nctdevpuntnl (punt=.)
Reply by KJ September 28, 20082008-09-28
"Nico Coesel" <nico@puntnl.niks> wrote in message 
news:48dfe4ee.168426254@news.planet.nl...
> jhallen@TheWorld.com (Joseph H Allen) wrote: > > An easier way without the extra jitter is to use an > output flipflop (aka DDR flipflop) which can be clocked using 2 > clocks. The first clock sets the output, the second clock (inverted > first clock) resets the output. And presto, you'll have a clock output > which is (within pin-to-pin skew) perfectly synchronous to the other > outputs. >
Hold time requirements (like the .4ns in the OP) will be impossible to guarantee with this method though. KJ
Reply by Nico Coesel September 28, 20082008-09-28
jhallen@TheWorld.com (Joseph H Allen) wrote:

>There are a number of ways to do this. Here is one way: > >What you usually want is for the clock at the pin of the SRAM chip to rise >at the same time as the FPGA internal global clock driving the output pads. > >A way to do this is to route the clock from the second PLL to two output >pins (right next to each other). One pin goes to the SSRAM. The other pin >goes to the feedback input of the PLL. The trace length for this feedback >line should be the same as the one which goes to the RAM. You have to use a >dedicated PLL feedback input pin for this to work. This way is nice because >you can usually leave the PLL phase shift setting at 0.
This is quite cumbersome and the PLL will add extra jitter (clock uncertainty). An easier way without the extra jitter is to use an output flipflop (aka DDR flipflop) which can be clocked using 2 clocks. The first clock sets the output, the second clock (inverted first clock) resets the output. And presto, you'll have a clock output which is (within pin-to-pin skew) perfectly synchronous to the other outputs. -- Programmeren in Almere? E-mail naar nico@nctdevpuntnl (punt=.)
Reply by KJ September 28, 20082008-09-28
On Sep 28, 1:52=A0am, Tommy Thorn <tommy.th...@gmail.com> wrote:
> KJ, thanks for a detailed reply. It makes perfect sense, unfortunately > as detailed below I'm not sure it's all feasible for me to do this > analytically, but I feel more comfortable playing with phase shifting > of the clock. >
Well, the bottom line in the calculations will still be coming up with a phase shift of the clock so you're poking around at what the correct solution will be
> > > Keep in mind though that there can easily be different delays from > > these two PLL outputs to the different destinations. > > Opps. I had assumed they were phase locked and that the screw between > them would be small enough to ignore. Using just a single PLL doesn't > seem like it would help (?) as the on-die clock might be offset from > the clock at the pin. >
Some pins (actually one) will be better choices than others since they are intended to be used as a PLL output pin. Using other pins is not necessarily a problem, just that you'll get more skew and have somewhat less control of it. But as you've discovered, there must be some skew between the two otherwise your adjusting of the phase wouldn't have made any difference (and it did).
> > You'll need to know... > > - What is an achievable skew between the clock at the internal flops > > and the clock as it is leaving your controller. > > Ok, but how do I find that? Browsing the Cyclone II data sheet didn't > reveal anything useful (unless I missed it). >
Peruse the timing reports of your design. There should be something saying what the delay to the output pin that goes to the SRAM is. Here it can get a bit muddy depending on whether you're using the 'Classic' timing analyzer or 'TimeQuest' but what you're trying to determine should be in the timing reports. What you do with that delay min/max numbers is use it to set a timing constraint that it will now have to always meet and also use that constraint to figure out what the phase shift is needs to be on the internal clock so that everything hangs together. It may sound kind of backwards using the result of a run to figure out what the constraints needs to be but it's not really. Ideally you would like there to be 0ns skew at the clock output pin, but if the software can't deliver that then what? At the end of the day, the absolute value of the delay doesn't matter since that gets accomodated for by changing the PLL phase delay, what hurts the most is the difference between the min and the max of that delay (again, available from the timing reports). The spread between min and max is something that can't be designed around and is something that will be tighter if the 'pll_out' pin is used. I'm not sure if the board you're using did this, but it's easy enough to check...nothing you can do about it, but might be worth knowing.
> > > - Net lengths and capacitive loading of the signals on the PCBA that > > go between the two devices. > > Ough. While that makes perfect sense, I simply don't think I have > enough information (or experience) to do that. Remember, this is an > off-the-shelf development kit and the provided examples do not even > use constraints. I do have the SSRAM datasheet which lists the Cin as > 6 pF and Cout as 8 pF. Eyeballing the traces, they looks roughly an > inch long (the SSRAM sits very close to the FPGA). >
That's good. Trace delays is ~6 inch per ns so round trip delay from FPGA to SRAM and back (as you would have during an SRAM read) would be < 1/3 ns which is pretty small (~3% of the overall clock cycle).
> >=A0What's important here is really > > differences between the various SRAM inputs and the clock. =A0From that > > you calculate an additional delay. > > Again, eyeballing it, the clock trace looks near identical to most of > the data. >
That's good too.
> > > From all of that you should come out with a sketch that shows where > > things need to be valid in order for the system to read and write > > properly. =A0Adjust the nominal phase of the clock leaving the device > > (or equivalently the FPGA internal clock) so that the clock occurs > > (both min and max time) at a point where everything is stable. =A0Keep > > in mind that as you shove the clock one way to improve setup time at > > the SRAM, you're most likely stealing that from the setup time at the > > FPGA when it is reading data back. > > Exactly. Which is why I was weary of the advice I found multiple > places: phase shift the SSRAM clock by 180 degree. >
180 degrees means you're inverting the clock. While that's an easy technique, giving up half of a 10 ns clock period will likely end up in still failing timing. It's best to figure out based on the Tco of the outputs, the skew of the clocks and the setup and hold times just where the clock can be placed. If it happens to be that an inverted clock would work, OK. If not, your analysis will show just where it can be placed. Again, ideally you would like the FPGA and the SRAM to both receive the clock at exactly the same time, that will give you the most margin.
> > Thankfully this board appears to be well designed. I can already hit > 170 MHz with my simple solution, but I'd like to push it to the limit > of the SRAM (200 MHz). >
Pushing to the limit usually just means that some extra analysis work is needed in order to make the design solid. That's all that is going on here.
> > Since it appears from your constraint that you're using Quartus, you > > might want to put in numbers that are representative of the correct > > capacitive loading as well as checking that the I/O drive strengths > > are appropriate and not just the defaults (unless =A0you've already don=
e
> > this). > > Ah, yes, I should do that. The pin capacitive loading I gave above. > I'm not sure how much I should estimate for the short trace. The DC > ELECTRICAL CHARACTERISTICS states this: > > Output HIGH Voltage min 2.4 V (test cond I_OH =3D -4 mA) > Output LOW Voltage max 0.4 V (test cond I_OL =3D -8 mA) > Input HIGH Voltage min 2.0 V > Input LOW Voltage max 0.8 V > Input Leakage [-5 uA; 5 uA] > Output Leakage [-5 uA; 5 uA] > > but doesn't explicitly mention an IO standard. I assume that LVTTL > (3.3 V) is a fine choice (the default?). Or is LVCMOS a better choice. > > I guess I have no idea of how to pick a suitable drive strength. >
LVTTL is the voltage standard (LVCMOS will be essentially the same). There is another setting for drive strength, look for something measured in mA. Since it sounds like the PCBA design has no obvious problems, set the drive strength to the max (likely 24mA). Kevin Jennings
Reply by Tommy Thorn September 28, 20082008-09-28
KJ, thanks for a detailed reply. It makes perfect sense, unfortunately
as detailed below I'm not sure it's all feasible for me to do this
analytically, but I feel more comfortable playing with phase shifting
of the clock.

On Sep 26, 12:14=A0pm, KJ <kkjenni...@sbcglobal.net> wrote:
> On Sep 26, 2:34=A0pm, Tommy Thorn <tommy.th...@gmail.com> wrote: > > > First, all output are fully registered (and constrained to guarantee > > they stay registered). The main logic is clocked by a PLL. A second > > but identically configured output on this PLL drives the SSRAM. > > Keep in mind though that there can easily be different delays from > these two PLL outputs to the different destinations.
Opps. I had assumed they were phase locked and that the screw between them would be small enough to ignore. Using just a single PLL doesn't seem like it would help (?) as the on-die clock might be offset from the clock at the pin.
> You'll need to know... > - What is an achievable skew between the clock at the internal flops > and the clock as it is leaving your controller.
Ok, but how do I find that? Browsing the Cyclone II data sheet didn't reveal anything useful (unless I missed it).
>=A0In some sense it > doesn't matter too much what that actual skew is, but you need to know > what it is so that you can then add a timing constraint so that this > delay is always met, or flagged as a timing error for you. =A0For the > sake of an example, let's just say that that there is a skew of 1 ns > between the internal clock and the clock at the output of the FPGA. > Keep in mind that this skew will have both a minimum and a maximum so > the skew is really a range between those two extremes. > > - Net lengths and capacitive loading of the signals on the PCBA that > go between the two devices.
Ough. While that makes perfect sense, I simply don't think I have enough information (or experience) to do that. Remember, this is an off-the-shelf development kit and the provided examples do not even use constraints. I do have the SSRAM datasheet which lists the Cin as 6 pF and Cout as 8 pF. Eyeballing the traces, they looks roughly an inch long (the SSRAM sits very close to the FPGA).
>=A0What's important here is really > differences between the various SRAM inputs and the clock. =A0From that > you calculate an additional delay.
Again, eyeballing it, the clock trace looks near identical to most of the data.
>=A0Practically speaking, you most > likely have roughly equal net lengths and loading on all of the > signals and this is not going to be a concern, but you should at least > be aware of this as well. =A0Different parts may have different > capacitive loading so if you want to get nit picky this delay will > also be a range with a min and a max but that range will typically be > much smaller than the uncertainty with the FPGA. > > From the FPGA clock skew min/max add on the additional delay for > length/loading differences and now you have a known window of clock > uncertainty. =A0Now get a piece of paper and sketch out some waveforms > showing the min/max switching times of the control signals (i.e. > address, oe, write, data_in, data_out) as well as the setup/hold time > of both the SRAM and the FPGA (for the data coming back in). =A0Somebody > was advertising a free timing waveform tool out here a few months back > (I don't remember the name), that may help but it's not that difficult > to paper sketch it either. > > From all of that you should come out with a sketch that shows where > things need to be valid in order for the system to read and write > properly. =A0Adjust the nominal phase of the clock leaving the device > (or equivalently the FPGA internal clock) so that the clock occurs > (both min and max time) at a point where everything is stable. =A0Keep > in mind that as you shove the clock one way to improve setup time at > the SRAM, you're most likely stealing that from the setup time at the > FPGA when it is reading data back.
Exactly. Which is why I was weary of the advice I found multiple places: phase shift the SSRAM clock by 180 degree.
> There can also be other concerns like if the nets are long you'll get > ringing which distorts the waveforms which basically means that you'll > need to wait a longer for things to stabilize which cuts into the > allowable timing. =A0At 100 MHz, just 1 ns is 10% of the clock cycle > budget. =A0Whether or not that's an issue or not you'll need to > determine with a scope.
Thankfully this board appears to be well designed. I can already hit 170 MHz with my simple solution, but I'd like to push it to the limit of the SRAM (200 MHz).
> > Phase-shift the SSRAM clock? > > Yes > > > Specify it as a timing constraint? > > Yes > > > Do timing constraints influence the output buffer or are they purely fo=
r
> > checking? > > For the most part it's just checking, although it can affect place and > route as well. =A0I haven't seen a case where it affected the output > buffer itself (i.e. kicked up or down the drive strength) in order to > meet a constraint. =A0This is most likely because drive strength > considerations have a much larger impact than just timing. =A0It does go > the other way though, as you fiddle with drive strength the software > should take this into account when it does the timing analysis. > > Since it appears from your constraint that you're using Quartus, you > might want to put in numbers that are representative of the correct > capacitive loading as well as checking that the I/O drive strengths > are appropriate and not just the defaults (unless =A0you've already done > this).
Ah, yes, I should do that. The pin capacitive loading I gave above. I'm not sure how much I should estimate for the short trace. The DC ELECTRICAL CHARACTERISTICS states this: Output HIGH Voltage min 2.4 V (test cond I_OH =3D -4 mA) Output LOW Voltage max 0.4 V (test cond I_OL =3D -8 mA) Input HIGH Voltage min 2.0 V Input LOW Voltage max 0.8 V Input Leakage [-5 uA; 5 uA] Output Leakage [-5 uA; 5 uA] but doesn't explicitly mention an IO standard. I assume that LVTTL (3.3 V) is a fine choice (the default?). Or is LVCMOS a better choice. I guess I have no idea of how to pick a suitable drive strength. Thanks again Tommy
Reply by Tommy Thorn September 27, 20082008-09-27
On Sep 26, 12:37=A0pm, jhal...@TheWorld.com (Joseph H Allen) wrote:
> You have to use a > dedicated PLL feedback input pin for this to work. =A0This way is nice > because you can usually leave the PLL phase shift setting at 0.
Thanks. This is a nice solution and I've used it on other projects. Unfortunately, I didn't design Terasic's DE2-70 dev kit and it doesn't have a feedback clock trace, so this isn't an option here. Tommy
Reply by Joseph H Allen September 26, 20082008-09-26
There are a number of ways to do this.  Here is one way:

What you usually want is for the clock at the pin of the SRAM chip to rise
at the same time as the FPGA internal global clock driving the output pads.

A way to do this is to route the clock from the second PLL to two output
pins (right next to each other).  One pin goes to the SSRAM.  The other pin
goes to the feedback input of the PLL.  The trace length for this feedback
line should be the same as the one which goes to the RAM.  You have to use a
dedicated PLL feedback input pin for this to work.  This way is nice because
you can usually leave the PLL phase shift setting at 0.

You have to use an "enhanced" PLL with external feedback pin and set it to
use external feedback pin mode.

In article <11ffca10-bec7-4cb7-bed4-458a3a8743dd@v13g2000pro.googlegroups.com>,
Tommy Thorn  <tommy.thorn@gmail.com> wrote:
>I wrote a little controller + tester app for the SSRAM on Terasic's >DE2-70 which is rated for 200 MHz. I have gotten it working @ 170 MHz, >but I'm a little uneasy about the SSRAM clock. Being a non-EE I >suspect I'm missing something fundamental here. > >First, all output are fully registered (and constrained to guarantee >they stay registered). The main logic is clocked by a PLL. A second >but identically configured output on this PLL drives the SSRAM. The >SSRAM datasheet lists the setup and hold times for the inputs as 1.4 >ns / 0.4 ns respectively. What is the correct way to achieve this? >Phase-shift the SSRAM clock? Specify it as a timing constraint? Do >timing constraints influence the output buffer or are they purely for >checking? > >Keeping the clock in phase with the main clock led to errors showing >up (once beyond 100 MHz), but shifting it a few degrees made it >perfectly stable again. However, what is the appropriate engineering >approach to this source synchronous problem? > >Any help would be much appreciated. > >Thanks, >Tommy > > >FWIW, these are the constraints I'm currently using: > ># timing constraints for SSRAM >set_instance_assignment -name FAST_OUTPUT_REGISTER ON -to oSRAM* >set_instance_assignment -name FAST_OUTPUT_REGISTER ON -to SRAM_* >set_instance_assignment -name FAST_OUTPUT_ENABLE_REGISTER ON -to >SRAM_* >set_instance_assignment -name TCO_REQUIREMENT "3 ns" -to oSRAM* >set_instance_assignment -name TCO_REQUIREMENT "3 ns" -to SRAM* >set_instance_assignment -name TSU_REQUIREMENT "2.2 ns" -to SRAM* > ># other default timings >set_global_assignment -name TSU_REQUIREMENT "5 ns" >set_global_assignment -name TCO_REQUIREMENT "10 ns"
-- /* jhallen@world.std.com AB1GO */ /* Joseph H. Allen */ int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0) +r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2 ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
Reply by KJ September 26, 20082008-09-26
On Sep 26, 2:34=A0pm, Tommy Thorn <tommy.th...@gmail.com> wrote:
> I wrote a little controller + tester app for the SSRAM on Terasic's > DE2-70 which is rated for 200 MHz. I have gotten it working @ 170 MHz, > but I'm a little uneasy about the SSRAM clock. Being a non-EE I > suspect I'm missing something fundamental here. > > First, all output are fully registered (and constrained to guarantee > they stay registered). The main logic is clocked by a PLL. A second > but identically configured output on this PLL drives the SSRAM.
Keep in mind though that there can easily be different delays from these two PLL outputs to the different destinations.
> The > SSRAM datasheet lists the setup and hold times for the inputs as 1.4 > ns / 0.4 ns respectively. What is the correct way to achieve this?
You'll need to know... - What is an achievable skew between the clock at the internal flops and the clock as it is leaving your controller. In some sense it doesn't matter too much what that actual skew is, but you need to know what it is so that you can then add a timing constraint so that this delay is always met, or flagged as a timing error for you. For the sake of an example, let's just say that that there is a skew of 1 ns between the internal clock and the clock at the output of the FPGA. Keep in mind that this skew will have both a minimum and a maximum so the skew is really a range between those two extremes. - Net lengths and capacitive loading of the signals on the PCBA that go between the two devices. What's important here is really differences between the various SRAM inputs and the clock. From that you calculate an additional delay. Practically speaking, you most likely have roughly equal net lengths and loading on all of the signals and this is not going to be a concern, but you should at least be aware of this as well. Different parts may have different capacitive loading so if you want to get nit picky this delay will also be a range with a min and a max but that range will typically be much smaller than the uncertainty with the FPGA. From the FPGA clock skew min/max add on the additional delay for length/loading differences and now you have a known window of clock uncertainty. Now get a piece of paper and sketch out some waveforms showing the min/max switching times of the control signals (i.e. address, oe, write, data_in, data_out) as well as the setup/hold time of both the SRAM and the FPGA (for the data coming back in). Somebody was advertising a free timing waveform tool out here a few months back (I don't remember the name), that may help but it's not that difficult to paper sketch it either. From all of that you should come out with a sketch that shows where things need to be valid in order for the system to read and write properly. Adjust the nominal phase of the clock leaving the device (or equivalently the FPGA internal clock) so that the clock occurs (both min and max time) at a point where everything is stable. Keep in mind that as you shove the clock one way to improve setup time at the SRAM, you're most likely stealing that from the setup time at the FPGA when it is reading data back. There can also be other concerns like if the nets are long you'll get ringing which distorts the waveforms which basically means that you'll need to wait a longer for things to stabilize which cuts into the allowable timing. At 100 MHz, just 1 ns is 10% of the clock cycle budget. Whether or not that's an issue or not you'll need to determine with a scope.
> Phase-shift the SSRAM clock?
Yes
> Specify it as a timing constraint?
Yes
> Do timing constraints influence the output buffer or are they purely for > checking?
For the most part it's just checking, although it can affect place and route as well. I haven't seen a case where it affected the output buffer itself (i.e. kicked up or down the drive strength) in order to meet a constraint. This is most likely because drive strength considerations have a much larger impact than just timing. It does go the other way though, as you fiddle with drive strength the software should take this into account when it does the timing analysis. Since it appears from your constraint that you're using Quartus, you might want to put in numbers that are representative of the correct capacitive loading as well as checking that the I/O drive strengths are appropriate and not just the defaults (unless you've already done this). KJ
Reply by Tommy Thorn September 26, 20082008-09-26
I wrote a little controller + tester app for the SSRAM on Terasic's
DE2-70 which is rated for 200 MHz. I have gotten it working @ 170 MHz,
but I'm a little uneasy about the SSRAM clock. Being a non-EE I
suspect I'm missing something fundamental here.

First, all output are fully registered (and constrained to guarantee
they stay registered). The main logic is clocked by a PLL. A second
but identically configured output on this PLL drives the SSRAM. The
SSRAM datasheet lists the setup and hold times for the inputs as 1.4
ns / 0.4 ns respectively. What is the correct way to achieve this?
Phase-shift the SSRAM clock? Specify it as a timing constraint? Do
timing constraints influence the output buffer or are they purely for
checking?

Keeping the clock in phase with the main clock led to errors showing
up (once beyond 100 MHz), but shifting it a few degrees made it
perfectly stable again. However, what is the appropriate engineering
approach to this source synchronous problem?

Any help would be much appreciated.

Thanks,
Tommy


FWIW, these are the constraints I'm currently using:

# timing constraints for SSRAM
set_instance_assignment -name FAST_OUTPUT_REGISTER ON -to oSRAM*
set_instance_assignment -name FAST_OUTPUT_REGISTER ON -to SRAM_*
set_instance_assignment -name FAST_OUTPUT_ENABLE_REGISTER ON -to
SRAM_*
set_instance_assignment -name TCO_REQUIREMENT "3 ns" -to oSRAM*
set_instance_assignment -name TCO_REQUIREMENT "3 ns" -to SRAM*
set_instance_assignment -name TSU_REQUIREMENT "2.2 ns" -to SRAM*

# other default timings
set_global_assignment -name TSU_REQUIREMENT "5 ns"
set_global_assignment -name TCO_REQUIREMENT "10 ns"