Ray Andraka wrote:> It appears to me that perhaps you are assuming the yeild is high, more > than 50% anyway. What happens to your assumption if the yield is more > like, say, 10-20%. It seems to me that the lower the yield, the more > attractive easypath becomes, especially if, as Austin indicated, most > yield fallout is only one defect.Actually the concept of managed defects becomes even stronger economically with lower yeilds, as the ratio to valuable recovered defect product to discards gets higher. Instead of discarding 80-90% of the product, you then add that yield to your revenue. Only untestable dies (those with power rail shorts, failed jtag interfaces, failed configuration paths, etc) are discarded. And even some of those can be recovered with additional design for failure strategies.> Maybe if you have enough spare cycles in your RC system, you can do that > in the background and hope you don't hit a defect in operational builds > before the defect map for the system is completed.Every system I've worked with has spare cycles ... doing testing/scrubbing in the idle loop is always a possibility.> The current tools make this even harder since the user has little > control over the what routing resources are used (there's directed > route, but it is tedious to use and is a largely manual effort), and > even less control over what routing resources can't be used. Granted, > this is a tools issue more than anything else, but the fact remains that > with the current state of the tools, I don't see this as feasible > right now. Yeah, I know, this supports your contention that the tools > should be open.Yep.> > Look at it from Xilinx's point of view though. What is in it for them?As I've posted elsewhere in this thread, increasing their revenue by 10-20x in a few years, and their long term market share substantially. Or, just staying in business.> More software that would need development and testing, more user > support, devices with defects out on the market that could wind up in > the hands of people thinking they have zero defect devices,or, as the disk drive market ... just assume every device has defects and design for it.> not to > mention their increased testing and administration cost to even > partially map the defects, or even determine to what degree the part > fails.Designing for defect detection and management changes the entire view of their process and ATE .... as I posted just a few minutes ago, why isn't this integrated on wafer, instead of continuing to do it with ATE as has been done for several decades?> I can see where the cost of doing it could exceed the potential > benefit. If it were profitable for them to do it, I'm sure they would > be pursuing it. In any event, it is a business decision on their part; > one they have every right to make.I can see where the cost of ATE could make the difference between being in business, or not. Designing for test, in a very different wafer oriented way I see as critical.> > Anyway, it still seems to me that the amount of extra work to manage > parts with defects would cost more than the cost savings in the part > prices not just for XIlinx, but also for the end user.For some applications sure .... for system level RC applications it's completely trival (and necessary) in the grand scheme of things. A very very minor amount of software. Consider that a new system could be brought up using triple redundancy and run live in that reduced capacity configuration for it's first few hundred/thousand hours, then back off to single redundancy for checking, and after the part is well qualified run only with background idle loop testing and scrubbing. Using the racked wafer strategy this is very viable to hold wafers in test for 72 hours or more before they are cut and packaged. Then the same strategy is extended to the package after it's brought up in a system. Designing for test, good test, coupled with defect management should only increase yields, lower costs, and benifit both the mfg and customer long term.
Re: Disk/LCD defect tolerant models for FPGA sales
Started by ●March 20, 2006
Reply by ●March 20, 20062006-03-20
Ray Andraka wrote:> Which reconfigurable FPGAs would those be with the non-volatile > bitstreams? I'm not aware of any.What are XC18V04's? Magic ROMs? What are the platform flash parts? Magic ROMs? They are CERTAINLY non-volatile every time I've checked. In fact, nonvolatile includes disks, optical, and just about any other medium that doesn't go poof when you turn the power off. and now this assertion that all the parts> have non-volatile storage sure makes it sound like you don't have the > hands on experience with FPGAs you'd like us to believe you have.Ok Wizard God of FPGA's ... just how do you configure your FPGA's without having some form of non-volatile storage handy? What ever the configuration bit stream sources is, if it is reprogramable ... IE ignore 17xx proms ... you can store the defect list? UNDERSTAND? Now, the insults are NOT -- I REPEAT NOT - being civil.> What are you doing different in the RC design then?With RC there is an operating system, complete with disk based filesystem. The intent is to do fast (VERY FAST) place and route on the fly.> From my > perspective, the only ways to be able to be able to tolerate changes in > the PAR solution and still make timing are to either be leaving a > considerable amount of excess performance margin (ie, not running the > parts at the high performance/high density corner), or spending an > inordinate amount of time looking for a suitable PAR solution for each > defect map, regardless of how coarse the map might be.You are finally getting warm. Several times in this forum I discussed what I call "clock binning" where the FPGA accel board has several fixed clocks arranged as integer powers. The dynamic runtime linker (very fast place and route) places, routes, and assigns the next slowest clock that matches the code block just linked. The concept is use the fastest clock that is available for the code block that meets timing. NOT change the clocks to fix the code.> From your previous posts regarding open tools and use of HLLs, I > suspect it is more on the leaving lots of performance on the table side > of things.Certainly ... it may not hardware optimized to the picosecond. Some will be, but that is a different problem. Shall we discuss every project you have done in 12 years as though it was the SAME problem with identical requirements? I think not. So why do you for me? In my own experience, the advantage offered by FPGAs is> rapidly eroded when you don't take advantage of the available > performance.The performance gains are measured against single threaded CPU's with serial memory systems. The performance gains are high degrees of parallelism with the FPGA. Giving up a little of the best case performance is NOT a problem. AND if it was, for a large dedicated application, then by all means, use traditional PAR and fit the best case clock the the code body.> If you are leaving enough margin in the design so that it is > tolerant to fortuitous routing changes to work around unique defects, > then I sincerely doubt you are going to run into the runaway thermal > problems you were concerned with.This is a completely different problem set than that particular question was addressing. That problem case was about hand packed serial-parallel MACs doing a Red-Black ordered simulations with kernel sizes between 80-200 LUT's, tiled in tight, running at best case clock rate. 97% active logic. VERY high transistion rates. About the only thing worse, would be purposefully toggling everything. A COMPLETELY DIFFERENT PROBLEM is compiling arbitrary C code and executing it with a compile, link, and go strategy. Example is a student iterratively testing a piece of code in an edit, compile and run sequence. In that case, getting the netlist bound to a reasonable set of LUTs quickly and running the test is much more important than extracting the last bit of performance from it. Like it or not .... that is what we mean by using the FPGA to EXECUTE netlists. We are not designing highly optimized hardware. The FPGA is simply a CPU -- a very parallel CPU.> Show me that my intuition is wrong.First you have taken and merged several different concepts, as though they were some how the same problem .... from various posting topics over the last several months. Surely we can distort anything you might want to present by taking your posts out of context and arguing them in the worst possible combination against you. Let's try - ONE topic, one discussion. Seems that you have made up your mind. As you have been openly insulting and mocking ... have a good day. When are really interested, maybe we can have a respectful discussion. You are pretty clueless today.
Reply by ●March 20, 20062006-03-20
Jim Granville wrote:> Ray Andraka wrote: > > fpga_toys@yahoo.com wrote: > >> The parts all have non-volatile storage for configuration.> I think John was meaning store the info in the ConfigFlashMemory. > Thus the read-erase-replace steps. > .. but, you STILL have to get this info into the FIRST design somehow....Thanks Jim ... that is EXACTLY what I did say. It doesn't mater if the configuration storage is on an 18V04, platform flash card, or a disk drive.
Reply by ●March 21, 20062006-03-21
John, last time I checked, FPGAs did not get delivered from Xilinx with the config prom. Sure, you can store a defect map on the config prom, or on your disk drive, or battery backed sram or whatever, but the point is that defect map has to get into your system somehow. Earlier in this thread you were asking/begging Xilinx to provide the defect map, even if just to one of 16 quadrants for each non-zero-defect part delivered. That leads to the administration nightmare I was talking about. In the absence of a defect map provided by Xilinx (which you were lobbying hard for a few days ago), the only other option is for the end user to run a large set of test configurations on each device while in system to map the defects. Writing that set of test configurations requires a knowledge of the device at a detail that is not available publicly, or getting ahold of the Xilinx test configurations, and expanding on them to obtain fault isolation. I'm not sure you realize the number of routing permutations that need to be run just to get fault coverage of all the routing, switchboxes, LUTs, etc in the device, and much less achieve fault isolation. Your posts regarding that seem to support this observation. > With RC there is an operating system, complete with disk based > filesystem. The intent is to do fast (VERY FAST) place and route on the > fly. > Now see, that is the fly in the ointment. The piece that is missing is the "very fast place and route". There is and has been a lot of research into improving place and route, but the fact of the matter is that in order to get performance that will make the FPGA compete favorably against a microprocessor is going to require a fast time to completion that is orders of magnitude faster than what we have now without giving up much in the way of performance. Sure, I can slow a clock down (by bin steps or using a programmable clock) to match the clock to the timing analysis for the current design, but that doesn't help you much for many real-world problems where you have a set time to complete the task. (yes, I know that may RC apps are not explicitly time constrained, but they do have to finish enough ahead of other approaches to make them economically justifiable). Remember also, that the RC FPGA starts out with a sizable handicap against a microprocessor with the time to load a configuration, plus if the configuration is generated on the fly the time to perform place and route. Once that hurdle is crossed, you still need enough of a performance boost over the microprocessor to amortize that set-up cost over the processing interval to come out ahead. Obviously, you gain from the parallelism in the FPGA, but if you don't also mind the performance angle, it is quite easy to wind up with designs that can only be clocked at a few tens of MHz, and often that use up so much area that you don't have room for enough parallelism to make up for the much lower clock rate. So that puts the dynamically configured RC in a box, where problems that aren't repetitive and complex enough to overcome the PAR and configuration times are better done on a microprocessor, and problems that take long enough to make the PAR time insignificant may be better served by a more optimized design than what has been discussed, and we're talking not only about PAR results, but also architecturally optimizing the design to get the highest clock rates and density. In my experience, FPGAs can do roughly 100x the performance of similar generation microprocessors, give or take an order of magnitude depending on the exact application and provided the FPGA design is done well. It is very easy to lose the advantage by sub-optimal design. If I had a dollar for every time I've gotten remarks that 100x performance is not possible, or that so and so did an FPGA design expecting only 10x and it turned out slower than a microprocessor because it wouldn't meet timing etc, I'd be retired. I guess I owe you an apology for merging your separate projects. I was under the impression (and glancing back over your posts still can interpret it this way) that these different topics were all addressing facets of the same RC project. I assumed (apparently erroneously) that this was all towards the same RC system. I also apologize for the insults, as I didn't mean to insult you or mock you, rather I was trying to point out that, taking all your posts together that I thought you were trying to hit all the corners of the design space at once, and at the same time do it on the cheap with defect ridden parts. I am still not convinced you aren't trying to hit everything at once....you know that old good, fast, cheap, pick any two thing. Rereading my post, I see that I let my tone get out of hand, and for that I ask your forgiveness. In any event, truely dynamic RC remains a tough nut to crack because of the PAR and configuration time issues. By adding the desire to use defect ridden parts, you are only making an already tough job much harder. I respectfully suggest you try first to get the system together using perfect FPGAs, as I believe you will find you already have an enormous task in front of you between the HLL to gates, the need for fast PAR, partitioning the problem over multiple FPGAs and between FPGAs and software, making a usable user interface and libraries etc, without exponentially compounding the problem by throwing defect tolerance into the mix. Baby steps are necessary to get through something as complex as this. fpga_toys@yahoo.com wrote:> Ray Andraka wrote: > >>Which reconfigurable FPGAs would those be with the non-volatile >>bitstreams? I'm not aware of any. > > > What are XC18V04's? Magic ROMs? > What are the platform flash parts? Magic ROMs? > They are CERTAINLY non-volatile every time I've checked. > > In fact, nonvolatile includes disks, optical, and just about any other > medium that doesn't go poof when you turn the power off. > > and now this assertion that all the parts > >>have non-volatile storage sure makes it sound like you don't have the >>hands on experience with FPGAs you'd like us to believe you have. > > > Ok Wizard God of FPGA's ... just how do you configure your FPGA's > without having some form of non-volatile storage handy? What ever the > configuration bit stream sources is, if it is reprogramable ... IE > ignore 17xx proms ... you can store the defect list? > > UNDERSTAND? > > Now, the insults are NOT -- I REPEAT NOT - being civil. > > > >>What are you doing different in the RC design then? > > > With RC there is an operating system, complete with disk based > filesystem. The intent is to do fast (VERY FAST) place and route on the > fly. > > >>From my >>perspective, the only ways to be able to be able to tolerate changes in >>the PAR solution and still make timing are to either be leaving a >>considerable amount of excess performance margin (ie, not running the >>parts at the high performance/high density corner), or spending an >>inordinate amount of time looking for a suitable PAR solution for each >>defect map, regardless of how coarse the map might be. > > > You are finally getting warm. Several times in this forum I discussed > what I call "clock binning" where the FPGA accel board has several > fixed clocks arranged as integer powers. The dynamic runtime linker > (very fast place and route) places, routes, and assigns the next > slowest clock that matches the code block just linked. The concept is > use the fastest clock that is available for the code block that meets > timing. NOT change the clocks to fix the code. > > >> From your previous posts regarding open tools and use of HLLs, I >>suspect it is more on the leaving lots of performance on the table side >>of things. > > > Certainly ... it may not hardware optimized to the picosecond. Some > will be, but that is a different problem. Shall we discuss every > project you have done in 12 years as though it was the SAME problem > with identical requirements? I think not. So why do you for me? > > In my own experience, the advantage offered by FPGAs is > >>rapidly eroded when you don't take advantage of the available >>performance. > > > The performance gains are measured against single threaded CPU's with > serial memory systems. The performance gains are high degrees of > parallelism with the FPGA. Giving up a little of the best case > performance is NOT a problem. AND if it was, for a large dedicated > application, then by all means, use traditional PAR and fit the best > case clock the the code body. > > >>If you are leaving enough margin in the design so that it is >>tolerant to fortuitous routing changes to work around unique defects, >>then I sincerely doubt you are going to run into the runaway thermal >>problems you were concerned with. > > > This is a completely different problem set than that particular > question was addressing. That problem case was about hand packed > serial-parallel MACs doing a Red-Black ordered simulations with kernel > sizes between 80-200 LUT's, tiled in tight, running at best case clock > rate. 97% active logic. VERY high transistion rates. About the only > thing worse, would be purposefully toggling everything. > > A COMPLETELY DIFFERENT PROBLEM is compiling arbitrary C code and > executing it with a compile, link, and go strategy. Example is a > student iterratively testing a piece of code in an edit, compile and > run sequence. In that case, getting the netlist bound to a reasonable > set of LUTs quickly and running the test is much more important than > extracting the last bit of performance from it. > > Like it or not .... that is what we mean by using the FPGA to EXECUTE > netlists. We are not designing highly optimized hardware. The FPGA is > simply a CPU -- a very parallel CPU. > > >>Show me that my intuition is wrong. > > > First you have taken and merged several different concepts, as though > they were some how the same problem .... from various posting topics > over the last several months. > > Surely we can distort anything you might want to present by taking your > posts out of context and arguing them in the worst possible combination > against you. > > Let's try - ONE topic, one discussion. > > Seems that you have made up your mind. As you have been openly > insulting and mocking ... have a good day. When are really interested, > maybe we can have a respectful discussion. You are pretty clueless > today. >
Reply by ●March 21, 20062006-03-21
Ray Andraka wrote: <snip> > In my experience, FPGAs can> do roughly 100x the performance of similar generation microprocessors, > give or take an order of magnitude depending on the exact application > and provided the FPGA design is done well. It is very easy to lose the > advantage by sub-optimal design. If I had a dollar for every time I've > gotten remarks that 100x performance is not possible, or that so and so > did an FPGA design expecting only 10x and it turned out slower than a > microprocessor because it wouldn't meet timing etc, I'd be retired.How does a FPGA compare with something like the cell processor ? I'd have thought that for reconfig computing, something like an array of CELLS, with FPGA bridge fabric, would be a more productive target for RC. FPGAs are great at distributed fabric, but not that good at memory bandwidth, especially at bandwidth/$. DSP task can target FPGAs OK, because the datasets are relatively small. Wasn't it Seymour Cray whot found that IO and Memory bandwidths were the key, not the raw CPU grunt ? -jg
Reply by ●March 21, 20062006-03-21
John Bass fpga_toys@yahoo.com wrote:>Phil Hays wrote: >> Why test at die level at all? Economics. Packaging costs money. >> Why test at package level at all? Full testing at wafer sort isn't >> realistic, and die damage during packaging happens. > >And for some, such a damn if you do, and damn if you don't is a >purfectly good excuse to do nothing. Life isn't purfect. Finding >solutions I find more valuable than finding restrctions and excuses.If you don't understand the problem, you are not very likely to come up with a solution.>> Something quite like this was tried. Some very good reasons not to do >> it were found, the hard way. "Human beings, who are almost unique in >> having the ability to learn from the experience of others, are also >> remarkable for their apparent disinclination to do so." (Douglas >> Adams) > >One of the most remarkable forms of success, is the difficult >challenges offered from failures.I'm sure Douglas Adams would agree. But you wouldn't like it.>> Power supply measurement requires an ammeter per power supply per die, >> or some way to switch an ammeter between measurement points, like >> relays. I'd love to hear your plan. > >It always comes down to V=IREver figure out what current a wafer full of die would draw? Now for the fun part. How to get all that current to all the die without too much voltage drop? Oh, and what if one die is in latchup?>and there are plenty of designs/products >that do current sensing well, even if an external reference standard is >required. Maybe one of the ATE functions is to calibrate on die >standards, and pass that to the rack manager.I thought you were not going to use an ATE.>> Some things can't be implemented on wafers. Disk drives, relays, >> precision resisters, ...>None of which are needed on die for self testing.As long as test coverage is way less than 50%, sure. -- Phil Hays
Reply by ●March 21, 20062006-03-21
Take two ... bad google day ... Phil Hays wrote:> If you don't understand the problem, you are not very likely to come > up with a solution.Quite true.> >> Something quite like this was tried. Some very good reasons not to do > >> it were found, the hard way. "Human beings, who are almost unique in > >> having the ability to learn from the experience of others, are also > >> remarkable for their apparent disinclination to do so." (Douglas > >> Adams) > > > >One of the most remarkable forms of success, is the difficult > >challenges offered from failures. > > I'm sure Douglas Adams would agree. But you wouldn't like it.You probably will not like the contradiction that it poses either: a) Experienced team A works diligently, ending in a heroic failure b) Team B offers regular help, which is turned down c) after the failure is complete, Team B completes the project quickly. Should Team B have accepted the failure as hard fact that the problem had no viable solution, and also failed by failing to try? (IE learning from Team A's failure) In 30 years of being self employed I've made about 20% of my income from taking over failed projects with a low bid no risk flat fee proposals to management ... no delivery, no payment. All I have at risk is my time and my reputation to always succeed on those projects. Several of those projects were taken from experienced teams that I offered friendly help on a regular basis, and was turned down. Others I took after one or more other companies failed to deliver what the client needed, often with sharp adivice that I would be doomed to repeating the cycle.> Ever figure out what current a wafer full of die would draw? Now for > the fun part. How to get all that current to all the die without too > much voltage drop? Oh, and what if one die is in latchup?yep ... and did you notice the part of the proposal about using on wafer power control for each die?> I thought you were not going to use an ATE.Did you notice the part of the proposal about using ATE for screening dangrous defects, like shorted power nets?> >> Some things can't be implemented on wafers. Disk drives, relays, > >> precision resisters, ... > > >None of which are needed on die for self testing. > > As long as test coverage is way less than 50%, sure.You have already given up if you think that. The explict idea behind defect managment is functional issolation by designing for 100% test coverage at some level of detail. Either a route, FF, LUT, buffer, or other resource fails testing, or is presumed operational, and to be screened if necessary by using redundant logic in the system level design initially. I suspect that this will be an itterative process of incremental refinement over a long period of time, maybe at first only saving 40-60% of the reject yield, and possibly progressing to nearly all. I suspect one of the most important parts of the process will be design refinements to prevent/issolate the failure impacts on future designs, increasing both the primary and secondary yields in the long term. One interesting form of "success" includes not reaching the entire objective, but leaving a carefully documented road map of the challenges, assumptions, and proposed solutions along the way so that those that follow have a better defined path to chip away at. Now, I don't know how much of Xilinx's yield is scrap today, or would be scrap at the end of 6 months, a year or two years. I do suspect the number will steadily decrease using design for defect management strategies. I do know the "cost" to Xilinx to sell scrap die and packaged product is pretty low, if it comes with a long term partnership to provide engineering input to increase yields for both zero defect, and managed defect segments. The long term promise of such a program is for each to act in good faith to increase revenues for both partners as the process matures. I believe that I can create products which are defect aware using the largest Xilinx parts, that presumably also have the largest percentage of rejects. I'm willing to invest the engineering into developing a recovery process, if Xilinx is willing to provide scrap material, and include in that partnership an agreement to share data and design suggestions to improve yeilds. As the recovery process becomes profitable, there is certainly incentives on both parties part to share that windfall. That's a pretty low risk deal for Xilinx if they are crushing scrap in die and packaged form today.
Reply by ●March 21, 20062006-03-21
fpga_toys@yahoo.com wrote:> I'm willing to invest the engineering into developing a > recovery process, if Xilinx is willing to provide scrap material, and > include in that partnership an agreement to share data and design > suggestions to improve yeilds. As the recovery process becomes > profitable, there is certainly incentives on both parties part to share > that windfall. That's a pretty low risk deal for Xilinx if they are > crushing scrap in die and packaged form today.I'm willing to consider the same for other FPGA vendors as well.
Reply by ●March 21, 20062006-03-21
fpga_toys@yahoo.com wrote:> fpga_toys@yahoo.com wrote: > >>I'm willing to invest the engineering into developing a >>recovery process, if Xilinx is willing to provide scrap material, and >>include in that partnership an agreement to share data and design >>suggestions to improve yeilds. As the recovery process becomes >>profitable, there is certainly incentives on both parties part to share >>that windfall. That's a pretty low risk deal for Xilinx if they are >>crushing scrap in die and packaged form today. > > > I'm willing to consider the same for other FPGA vendors as well.Sounds simple - become an EasyPath customer! You can make your own bit streams ( IIRC, two are allowed ? ), and thus get die that are in a 'possibly faulty, but partially proven' bin, and expand from there..... -jg
Reply by ●March 21, 20062006-03-21





