All, Latest update on atmospheric upsets: http://tinyurl.com/c9y5l Virtex 4 memory cells are almost twice as hard to upset as Virtex II. We promised to reduce our susceptibility to atmospheric upsets, and we are fulfilling that promise. Not all semi companies have made this choice: it is hard to do, and increases area. I know of work being done at Intel, and Cypress to improve, but nowhere else. It is highly likely that competing 90nm FPGA companies have done anything at all (except get a lot worse). The ASIC (ASSP, hardened solutions, etc.) also have not made this choice (as it would really blow up their area a lot). Thus, 90nm ASIC technology has a typical SRAM FIT rate of 5,000 FIT/Mb (from neutron data error rate specifications for a typical 90nm SRAM ASIC cell), as compared to our less than 250 FIT/Mb. The ASIC DFF's, logic, etc. are also a fantastic neutron detector: the resulting hardness of the Virtex 4 is on par with, and better than a full custom 90nm ASIC doing the same task! Unfortunately, no data is available on ASIC's, as they just don't know. To test, one would have to place the part in a neutron beam, while running, which is rather hard to do with a complete system ... Caveat Emptor! Virtex 4 on the other hand, combines with built in ECC for the BRAM, and built in FRAME_ECC for the configuration, which allows for selecting whatever level of system hardness to soft errors is desired. Austin
40% less SEU's! in V4: another good reason to choose Xilinx
Started by ●May 6, 2005
Reply by ●May 6, 20052005-05-06
Hi Austin, I'm really happy for you. Are there any V4s without the money-eating ECC stuff for us terrestrials? Ben
Reply by ●May 6, 20052005-05-06
Nice try! ECC at the 64-bit parallel level eats only 8 extra bits, and our BlockRAMs had those traditional parity bits all the time. No extra storage cost. Just some clever partitioning... "The best things in life are (almost) free" Peter Alfke
Reply by ●May 6, 20052005-05-06
Hi Peter, I learned about SEU that you can design redundant (three times the logic if you can convince your compiler not to remove redundant logic). This will keep the user logic save. But is there a way to keep configuration save since this changes logic and routing? Regards, Thomas
Reply by ●May 6, 20052005-05-06
And, The frame_ecc is 12 bits per 1312, or less than 1% overhead. Austin Peter Alfke wrote:> Nice try! > ECC at the 64-bit parallel level eats only 8 extra bits, and our > BlockRAMs had those traditional parity bits all the time. No extra > storage cost. Just some clever partitioning... > "The best things in life are (almost) free" > Peter Alfke >
Reply by ●May 6, 20052005-05-06
Thomas, Yes. The Xilinx TMR (XTMR) tool converts the design from the designed and placed to a full TMR design automatically taking advantage of our structure so that no one config bit can upset the function. FRAME_ECC allows a design to do redundancy in time (RIT). Calculate what you need, check if an error has occured, if not, go on. If an error has occurred, fix the error, step back, recalculate. Repeat. Between XTMR which allows you to choose only those critical areas that need triplication for redundancy in space (RIS), and FRAME_ECC which enables redundancy in time, an arbritraily safe system can be implemented. For example: Simplest - do nothing. With an effective system FIT rate of 20 FIT/Mb of config memory, this may be so far down in the noise, it isn't an issue. Next step - when the FRAME_ECC indicates an error, reconfigure the chip. This creates some unavailability, but is able to keep any errors from propagating any further. Or back up, and recalculate the result after flipping the bit back (RIT). Little better - when a error is detected, correct it. Since from 1 in 10 to 1 in 80 flips actually hits something that matters (real data from real customers), there is a 1% to 10% chance that flip could ever cause an error, and since you fix it in less than 200 ms (for the largest part), the probability that in that 200 ms something critical changeds, and it mattered is even tinier (like maybe one in a thousand chance). And, if you add to this RIT, it is even more bulletproof. Even better - since this is a system that requires a hot spare (at this point, we are talking about 99.9995% available systems where the hard fail rate kills you first) you detect a soft error, and switch to the redundant unit immediately while you fix the bit, and do a system recheck. Best - triplicate critical elements AND have a hot standby that can be switched to in case of soft error detect. All of the above are enabled in V4 -- it is up to you to set your FIT rate goals, and then fufill them. Can't do that with the competition -- they just don't have all the options we do. For example, a complete reconfig takes them down, but we can reconfig while still operating, and fix the flipped bit back. Austin Thomas Rudloff wrote:> Hi Peter, > > I learned about SEU that you can design redundant (three times the logic > if you can convince your compiler not to remove redundant logic). This > will keep the user logic save. But is there a way to keep configuration > save since this changes logic and routing? > > Regards, > Thomas >
Reply by ●May 6, 20052005-05-06
Austin Lesea wrote:> The ASIC DFF's, logic, etc. are also a fantastic neutron detector: the > resulting hardness of the Virtex 4 is on par with, and better than a > full custom 90nm ASIC doing the same task!BTW, is it possible to order a special, rad-hard version of a modern medium-complexity FPGA chip, say, comparable with Cyclone 1C3? Would it mean a complete redesign of the chip internals or is it relatively simple? Best regards Piotr Wyderski
Reply by ●May 6, 20052005-05-06
Piotr, Very observant question. For atmospheric upsets, it is a relatively easy process to change all memory cells to SERT or DICE single upset hardened cells, with an increase in area as you go from 6T cells to 12T and 16T cells in the ASMBL columnar architecture which is actually trivial to do. But who will pay for this? Without the ASMBL architecture, it requires a complete relayout. If there are ways to design that result in the desired system FIT rate, one must comapre the costs of the extra logic with the costs of hardening the design (hard IP vs. soft IP). I believe the answer is a judicious combination of both: make the basic FIT rate better, and also provide some degree of hardening without incurring too much cost. Austin Piotr Wyderski wrote:> Austin Lesea wrote: > >> The ASIC DFF's, logic, etc. are also a fantastic neutron detector: >> the resulting hardness of the Virtex 4 is on par with, and better than >> a full custom 90nm ASIC doing the same task! > > > BTW, is it possible to order a special, rad-hard version of > a modern medium-complexity FPGA chip, say, comparable > with Cyclone 1C3? Would it mean a complete redesign of > the chip internals or is it relatively simple? > > Best regards > Piotr Wyderski >
Reply by ●May 7, 20052005-05-07
Hi Peter Alfke,> Nice try! > ECC at the 64-bit parallel level eats only 8 extra bits, and our > BlockRAMs had those traditional parity bits all the time. No extra > storage cost. Just some clever partitioning...There's addtitional bit lanes in Altera devices too. So what does this add then? Did you add optional hard ECC generation/detection blocks to these 9th/18th bits? Or does the user have to code this him/herself? If it's an optional hard macro we're looking at 2 configurable muxes and an ECC generator on the input side, and 2 configurable muxes and an ECC checker on the output side for evey set of 9 bits. Also, do the V4s run continuous config sanity checks like Altera's devices? Best regards, Ben
Reply by ●May 7, 20052005-05-07
Ben, See below, Austin Ben Twijnstra wrote:> Hi Peter Alfke, > > >>Nice try! >>ECC at the 64-bit parallel level eats only 8 extra bits, and our >>BlockRAMs had those traditional parity bits all the time. No extra >>storage cost. Just some clever partitioning... > > > There's addtitional bit lanes in Altera devices too.To do what?> > So what does this add then? Did you add optional hard ECC > generation/detection blocks to these 9th/18th bits? Or does the user have > to code this him/herself?We have hard ECC, 72/64 code, that can be instantiated to provide single bit error correction, and doulble bit error detection with no soft IP required.> > If it's an optional hard macro we're looking at 2 configurable muxes and an > ECC generator on the input side, and 2 configurable muxes and an ECC > checker on the output side for evey set of 9 bits. > > Also, do the V4s run continuous config sanity checks like Altera's devices?We allow the custoemr to decide what they want to do: they can do just a check, or a check and correct, or nothing at all. They pay the least possible because we only harden what we need to enable this feature, not the whole thing. What A offers is a "oh no!" bit: if it is set, you have no recourse but to reconfigure and start over. That is all A allows the customer to know, nothing more. The same IP also allows the customer to flip bits so that they can see what effect NSEUs would have without having to go to a neutron beam (which is very expensive,, and time consuming).





