comp.arch.fpga | sram| page 3

Reply by rickman ●August 6, 20172017-08-06

KJ wrote on 8/6/2017 1:33 PM:
> On Sunday, August 6, 2017 at 12:40:25 PM UTC-4, rickman wrote:
>> KJ wrote on 8/6/2017 8:01 AM:
>>> It's even easier than that to synchronously control a standard async SRAM.  Simply connect WE to the clock and hold OE active all the time except for cycles where you want to write something new into the SRAM.
>>
>> That would depend a *lot* on the details of the setup and hold times for the
>> async SRAM, no?  You can do what you want with data for much of the clock
>> cycle, but the address has to meet setup and hold for the entire WE time.
>> That's typically more than half a clock cycle and makes it hard to use it on
>> every clock cycle.
>>
> Address (and data) setup and hold times are easily met.  As a first order approximation, the setup time will be T/2-Tco(max).  The address hold time will be Tco(min).
>
> What is your source for statement "That's typically more than half a clock cycle"?  The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].

I'm talking about the time the address must remain stable.  Your 
calculations above show it is at a minimum T/2.

When running with fast SRAM it can be very hard to get this to work 
properly.  The devil is in the details of the chips.

> The technique works.  You get single cycle read or write on 100% of the clock cycles, timing is met, period...and it worked 20+ years ago on product I designed [2].

Great!  You were able to use it on one device at an unknown speed.  What was 
the clock period?

Did you supply the WE from the external clock (same as to the FPGA) or a 
copy of the clock from inside the FPGA?  In the case of the former the total 
delay through the chip of the signals can be a significant part of the setup 
margin.  If the latter it is hard to control the routing delays.

-- 

Rick C

Reply by ●August 6, 20172017-08-06

Den s=C3=B8ndag den 6. august 2017 kl. 19.40.40 UTC+2 skrev KJ:
> On Sunday, August 6, 2017 at 1:30:46 PM UTC-4, lasselangwad...@gmail.com =
wrote:
> > Den s=C3=B8ndag den 6. august 2017 kl. 18.40.25 UTC+2 skrev rickman:
> >=20
> > and just using the clock give you the headache of trying to control rou=
ting=20
> > delays on data vs. WE=20
> >=20
> > using the dual edge output flipflop makes it all much controllable
>=20
> Not true.  There is nothing special that needs to be done to "control rou=
ting delays on data vs. WE".  Do you have any basis for that statement?

to get your clock out to you WE pin you first have to get off the clock net=
work and out to an IO, how are you going to guarantee that delay is the sam=
e as the data going from an output flop to an io?=20

>=20
> Using the method I described is absolutely the same as connecting up two =
74X374 flip flops, nothing more, nothing less.  How is that a 'headache'?
>=20

with a string of 374 you also have to make sure the delay on the clock is=
=20
controlled with regards to the data=20

Reply by KJ ●August 6, 20172017-08-06

On Sunday, August 6, 2017 at 2:08:09 PM UTC-4, lasselangwad...@gmail.com wr=
ote:
> Den s=C3=B8ndag den 6. august 2017 kl. 19.40.40 UTC+2 skrev KJ:
> > Not true.  There is nothing special that needs to be done to "control r=
outing delays on data vs. WE".  Do you have any basis for that statement?
>=20
> to get your clock out to you WE pin you first have to get off the clock n=
etwork and out to an IO, how are you going to guarantee that delay is the s=
ame as the data going from an output flop to an io?=20
>=20

One does not need to "guarantee that delay is the same as the data going fr=
om an output flop to an io" in order to get a working design as you stated.=
  Instead, one can design it such that the clock to the flip flops that gen=
erates the address/data/control signals are simultaneous, within some appli=
cable design tolerance, with the clock signal (aka WE) arriving at the SRAM=
.

In fact, since there are tolerances, if you design it such that the nominal=
 data delay matches the nominal clock delay as you suggest you are essentia=
lly crossing your fingers hoping that you don't run across a 'fast' data pa=
th and a 'slow' clock path over full PVT range...either that or you are len=
gthening the data path on the PCBA to guarantee that it never beats the clo=
ck.  Yes you can do that to get a guaranteed working design, but that would=
 seem to be more of the 'headache' that you mentioned than my approach of j=
ust routing them on the shortest path as one would probably normally do any=
way.

> >=20
> > Using the method I described is absolutely the same as connecting up tw=
o 74X374 flip flops, nothing more, nothing less.  How is that a 'headache'?
> >=20
>=20
> with a string of 374 you also have to make sure the delay on the clock is=
=20
> controlled with regards to the data

No.  The delay of data relative to clock in a string of flip flops is not i=
mportant at all if every flip flop receives the same rising edge.  Getting =
multiple receivers to receive the same clock signal (to within some toleran=
ce) is something that a designer does have control over.  Relying on the co=
ntrol of skew between two or more signals, not so much.

This simultaneous receipt of the clock signal is essentially what goes on i=
nside every FPGA.  You can send any FF output to the input of any other FF =
on the device because they design the clock network to produce this simulta=
neous action.  It's not because they added data routing delays.

Kevin Jennings

Reply by KJ ●August 6, 20172017-08-06

On Sunday, August 6, 2017 at 2:07:35 PM UTC-4, rickman wrote:
> KJ wrote on 8/6/2017 1:33 PM:
> > What is your source for statement "That's typically more than half a cl=
ock cycle"?  The ancient Cypress CY62256N lists both of these requirements =
(Tsa and Tha) as 0 ns [1].
>=20
> I'm talking about the time the address must remain stable.  Your=20
> calculations above show it is at a minimum T/2.
>=20

My calculation is T/2-Tco(max).  As long as Tco(max) <=3D T/2 then the desi=
gn will work with anything compatible with the Cypress part that I previous=
ly referenced that requires 0 setup time.  Tco(max) being less than one hal=
f of a clock cycle is not much of a hurdle.  The SRAM access time will typi=
cally be greater.

> When running with fast SRAM it can be very hard to get this to work=20
> properly.

Speaking for myself I can say that no it was not hard at all, it worked rig=
ht at the start.  I'm not sure where you see the difficulty.

>  The devil is in the details of the chips.
>=20
And I've provided the details.  More so than you.

> Great! =20
Thanks!

> You were able to use it on one device at an unknown speed.
You're making assumptions here that are incorrect.
 =20
> What was the clock period?
I dunno, that was 20+ years ago but it was using the fastest available CMOS=
 SRAMs of the mid to late 1990s.  But the clock speed is not relevant, the =
technique is still valid.  The biggest limiting factor is going to be the r=
ead/write speed of the async SRAM.

Kevin Jennings

Reply by Richard Damon ●August 6, 20172017-08-06

On 8/6/17 3:42 PM, KJ wrote:
> On Sunday, August 6, 2017 at 2:07:35 PM UTC-4, rickman wrote:
>> KJ wrote on 8/6/2017 1:33 PM:
>>> What is your source for statement "That's typically more than half a clock cycle"?  The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].
>>
>> I'm talking about the time the address must remain stable.  Your
>> calculations above show it is at a minimum T/2.
>>
> 
> My calculation is T/2-Tco(max).  As long as Tco(max) <= T/2 then the design will work with anything compatible with the Cypress part that I previously referenced that requires 0 setup time.  Tco(max) being less than one half of a clock cycle is not much of a hurdle.  The SRAM access time will typically be greater.
> 
>> When running with fast SRAM it can be very hard to get this to work
>> properly.
> 
> Speaking for myself I can say that no it was not hard at all, it worked right at the start.  I'm not sure where you see the difficulty.
> 
>>   The devil is in the details of the chips.
>>
> And I've provided the details.  More so than you.
> 
>> Great!
> Thanks!
> 
>> You were able to use it on one device at an unknown speed.
> You're making assumptions here that are incorrect.
>    
>> What was the clock period?
> I dunno, that was 20+ years ago but it was using the fastest available CMOS SRAMs of the mid to late 1990s.  But the clock speed is not relevant, the technique is still valid.  The biggest limiting factor is going to be the read/write speed of the async SRAM.
> 
> Kevin Jennings
> 

I think, if I understand what you are proposing, one big issue is you 
seem to be assuming that the clock that you are using as WE starts 
external to the FPGA (or at least comes out and goes back in) so that 
you know the clock rises before the data on the address bus can change. 
 From my experience, in very many cases, this is NOT true for an FPGA 
design, but some slower clock comes in, and the highest speed clocks are 
generated by PLLs in the FPGA.

A second really big issue is how do you do a read cycle if the write 
comes ungated from the clock. The best I can figure is you are assuming 
you can get a read done in 1/2 a clock cycle and just rewrite the data. 
In most such rams WE overrides OE, and the Selects kill both read and 
write. Unless you had a part with both a WE and WS (where WE could 
disable the WS, but did not itself need to have the required setup/hold 
to address) I can't see how you do reads with the clock anywhere close 
to cycle time, and having a WOM (Write only Memory) isn't that useful here.

Reply by ●August 6, 20172017-08-06

Den s=C3=B8ndag den 6. august 2017 kl. 21.26.23 UTC+2 skrev KJ:
> On Sunday, August 6, 2017 at 2:08:09 PM UTC-4, lasselangwad...@gmail.com =
wrote:
> > Den s=C3=B8ndag den 6. august 2017 kl. 19.40.40 UTC+2 skrev KJ:
> > > Not true.  There is nothing special that needs to be done to "control=
 routing delays on data vs. WE".  Do you have any basis for that statement?
> >=20
> > to get your clock out to you WE pin you first have to get off the clock=
 network and out to an IO, how are you going to guarantee that delay is the=
 same as the data going from an output flop to an io?=20
> >=20
>=20
> One does not need to "guarantee that delay is the same as the data going =
from an output flop to an io" in order to get a working design as you state=
d.  Instead, one can design it such that the clock to the flip flops that g=
enerates the address/data/control signals are simultaneous, within some app=
licable design tolerance, with the clock signal (aka WE) arriving at the SR=
AM.


how are you going to control the delay from output ff to io vs, clock getti=
ng of the clock tree to io?=20

>=20
> In fact, since there are tolerances, if you design it such that the nomin=
al data delay matches the nominal clock delay as you suggest you are essent=
ially crossing your fingers hoping that you don't run across a 'fast' data =
path and a 'slow' clock path over full PVT range...either that or you are l=
engthening the data path on the PCBA to guarantee that it never beats the c=
lock.  Yes you can do that to get a guaranteed working design, but that wou=
ld seem to be more of the 'headache' that you mentioned than my approach of=
 just routing them on the shortest path as one would probably normally do a=
nyway.


using a DDR output data and WE all have the same path to io and should thus=
 track over PVT

using the clock directly is pretty much guranteed to add more delay than th=
e clock to out on the output ffs

>=20
> > >=20
> > > Using the method I described is absolutely the same as connecting up =
two 74X374 flip flops, nothing more, nothing less.  How is that a 'headache=
'?
> > >=20
> >=20
> > with a string of 374 you also have to make sure the delay on the clock =
is=20
> > controlled with regards to the data
>=20
> No.  The delay of data relative to clock in a string of flip flops is not=
 important at all if every flip flop receives the same rising edge.  Gettin=
g multiple receivers to receive the same clock signal (to within some toler=
ance) is something that a designer does have control over.  Relying on the =
control of skew between two or more signals, not so much.
>=20
> This simultaneous receipt of the clock signal is essentially what goes on=
 inside every FPGA.  You can send any FF output to the input of any other F=
F on the device because they design the clock network to produce this simul=
taneous action.  It's not because they added data routing delays.
>=20

FF out to FF in is safe by design, once you m mix in clock used as "data" y=
ou add an unknown delay

Reply by KJ ●August 6, 20172017-08-06

On Sunday, August 6, 2017 at 4:09:28 PM UTC-4, Richard Damon wrote:
> On 8/6/17 3:42 PM, KJ wrote:
> I think, if I understand what you are proposing, one big issue is you=20
> seem to be assuming that the clock that you are using as WE starts=20
> external to the FPGA (or at least comes out and goes back in) so that=20
> you know the clock rises before the data on the address bus can change.=
=20
>  From my experience, in very many cases, this is NOT true for an FPGA=20
> design, but some slower clock comes in, and the highest speed clocks are=
=20
> generated by PLLs in the FPGA.
>=20

No that was not my assumption.  The clocking situation is no different than=
 how one synchronizing the internal clock and the external clock in SDRAM o=
r DDR.  Even before there were DDR parts and DDR flops in FPGAs, there were=
 single clock SDRAMs and they had FPGA controllers.  Clock synchronization =
between the FPGA and the SDRAM is required there as well and would use the =
same control technique.

> In most such rams WE overrides OE, and the Selects kill both read and=20
> write.=20

Well, looking around now that does seem to be the case today which sort of =
makes me wonder which SRAM I was using back then.  At that time, CE and OE =
enabled the I/O drivers independent of WE.  Writing to memory was sometimes=
 (depending on the part) inhibited if OE was active.  I don't believe I rel=
ied on any bus-hold circuit circuit or any sort of other trickery like that=
.  I will say that the design did work and was in production for several ye=
ars without issue but, in any case, my solution does not seem applicable to=
day.  Interesting, good catch.

Kevin Jennings

Reply by KJ ●August 6, 20172017-08-06

On Sunday, August 6, 2017 at 4:09:31 PM UTC-4, lasselangwad...@gmail.com wrote:
> 
> how are you going to control the delay from output ff to io vs, clock getting of the clock tree to io? 
> 
By using the phase control of the PLL to adjust the clock leaving the chip relative to the clock internal to the chip.  That can be done in a way to guarantee operation.

> using a DDR output data and WE all have the same path to io and should 
> thus track over PVT

'Should' is an important word there...but practically speaking I agree that there is probably 'slim' chances of failure.

> FF out to FF in is safe by design, once you m mix in clock used as "data" 
> you add an unknown delay

Mixing in clock as data was not what I was doing.  In any case, based on my reply to Richard Damon's post, my approach, while it worked back in the day, wouldn't work now.

Kevin

Reply by rickman ●August 6, 20172017-08-06

KJ wrote on 8/6/2017 3:42 PM:
> On Sunday, August 6, 2017 at 2:07:35 PM UTC-4, rickman wrote:
>> KJ wrote on 8/6/2017 1:33 PM:
>>> What is your source for statement "That's typically more than half a clock cycle"?  The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].
>>
>> I'm talking about the time the address must remain stable.  Your
>> calculations above show it is at a minimum T/2.
>>
>
> My calculation is T/2-Tco(max).  As long as Tco(max) <= T/2 then the design will work with anything compatible with the Cypress part that I previously referenced that requires 0 setup time.  Tco(max) being less than one half of a clock cycle is not much of a hurdle.  The SRAM access time will typically be greater.
>
>> When running with fast SRAM it can be very hard to get this to work
>> properly.
>
> Speaking for myself I can say that no it was not hard at all, it worked right at the start.  I'm not sure where you see the difficulty.
>
>>  The devil is in the details of the chips.
>>
> And I've provided the details.  More so than you.
>
>> Great!
> Thanks!
>
>> You were able to use it on one device at an unknown speed.
> You're making assumptions here that are incorrect.
>
>> What was the clock period?
> I dunno, that was 20+ years ago but it was using the fastest available CMOS SRAMs of the mid to late 1990s.  But the clock speed is not relevant, the technique is still valid.  The biggest limiting factor is going to be the read/write speed of the async SRAM.

Hmmm, looking at a current data sheet I don't see where you can gate the 
write cycle with OE.  WE, the byte enables and CE, but not OE.

-- 

Rick C

Reply by ●August 8, 20172017-08-08

KJ wrote:
>
> It's even easier than that to synchronously control a standard async SRAM.  
> Simply connect WE to the clock and hold OE active all the time except 
> for cycles where you want to write something new into the SRAM.
>
As has been explained to you in detail by several other posters, your  method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the chip, coincident with other registered I/O signals, is to use the dual-edge IOB flip-flops as I suggested. 

The DDR technique I mentioned would run synchronous single-cycle read or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse width requirements.

 Another advantage of the 'forwarding' method is that one can use the internal FPGA clock resources for clock multiply/divides etc. without needing to also manage the board-level low-skew clock distribution needed by your method.

-Brian

Previous 1 234 5 Next

sram

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group