FPGARelated.com
Forums

Lattice Announces EOL for XP and EC/P Product Lines

Started by rickman July 30, 2013
On 8/26/2013 7:43 PM, jg wrote:
> On Saturday, August 24, 2013 1:06:37 PM UTC+12, rickman wrote: >> >> >> I suppose the markets are different enough that FPGAs just can't be >> produced economically in as wide a range of packaging. I think that is >> what Austin Leesa used to say, that it just costs too much to provide >> the parts in a lot of packages. Their real market is at the high end. >> Like the big CPU makers, it is all about raising the ASP, Average >> Selling Price. > > I was meaning more at the lower end, - eg where lattice can offer parts in QFN32, then take a large jump to Qfp100 0.5mm. > QFN32 have only 21 io, so you easily exceed that, but there is a very large gap between the QFN32 and TQFP100. > They claim to be chasing the lower cost markets with these parts, but seem rather blinkered in doing so. > Altera well priced parts in gull wing,(MAX V) but only in a 0.4mm pitch.
I think you mean Lattice offers a part in the QFN32. I only found the XO2-256. A selection guide that isn't even a year old says they offer an iCE40 part in this package, but it doesn't show up in the data sheet from 2013. I guess the product line is still a little schizo. They haven't finished cleaning house and any part or package could be next.
>> As long as we are wishing for stuff, I'd really love to see a smallish >> MCU mated to a smallish FPGA. > > If you push that 'smallish', the Cypress PSoC series have uC+logic.
That's a pretty hard "push". I've looked at them but I don't get a warm fuzzy from a company that makes everything an uphill climb when they seem to think they are making it easy. I've been looking at the PSOC parts since they were new. At that time support was so crude, they had a weekly conference call and if you joined in you got a 1 on 1 training session. That progressed through a long development aimed at making their parts push button and I am pretty sure that won't even come close to working for my needs. I need a small FPGA, maybe 1000 LUTs to provide the high speed portion of the design. I don't even need a "real" processor, I bet I could live a rich full life (in this design anyway) with an 8051. In fact, that is an option, to add an MCU for most of the I/O and processing, then use something like the XO2-256 in a QFN32 to do the high speed stuff. I'm just not sure I can fit the design in 256 LUTs. Maybe the QFN32 is small enough I can use two? 5x5 mm!
> The newest PSoc4 seems to have solved some of the sticker shock, but I think they crippled the Logic to achieve that. > Seems there is no free lunch. > > Cypress do however, grasp the package issue, and offer QFN40(0.5mm), as well as SSOP28(0.65mm) and TQFP44(0.8mm).
Yeah, while the FPGA guys are rather phobic of issuing a lot of package combinations the MCU folks have tons of them. They have a much tougher problem with all the combos of RAM, Flash, I/O count, clock speed, ... I can see why the FPGA people haven't embraced the idea of combining MCU with FPGA, it just doesn't fit their culture. -- Rick
On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote:
> On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote: >> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote: >> >>> >> >>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700. >> >>> Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C. >> >> >> >> I can't say I fully understand the ALM, but I think it functions as a >> lot more than just a pair of 4 input LUTs. It will do that without any >> issue. But it will do a lot more and I expect this is used to great >> advantage in a CPU. I'd say the ALM is equivalent to between 3 and 4 >> LUT4s depending on the design. I guess it is hard to compare between >> different device types. >> > > No, ALM is close to two 4-input LUTs. May be, a bit more when implementing complex tightly-coupled logic with high internal complexity to fanout ratio. May be, a bit less, when implementing simple things with lots of registers and high fanout. > > For sake of the argument, I compiled Nios2e for Cyclone4, which has more old-fashioned architecture - 676 LCs + 2 M9Ks. > I also dug out my real-world design from many years ago that embeds Nios2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks. > > >> >> >> >> >>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :( >> >> Yes, there are always lots of tradeoffs to be considered. >> > > My point is - if you don't need performance and can use embedded memories then you can design useful 32-bit RISC CPU which would be non-trivially smaller than 600 LCs. > Nios2e core that I took as example is small, but hardly minimalistic. It implements full Nios2 architecture including several parts that you probably don't need. In particular: > - everything related to interrupts and exceptions > - support for big program address space > - ability to run execute programs from any memories others than on-chip SRAM
If the size of the NIOS2 is as small as you say, then that only leaves two issues with using the NIOS2 in my FPGA designs. The first is that I don't need 32 bit data paths in addition to the large memory address bus. I assume this means the instructions are not so compact using more on chip memory than desired. But the really big issue with using the NIOS2 is not technical, Altera won't let you use it on anything that isn't an Altera part. So in reality this is a non-starter no matter how good the NIOS2 is technically. -- Rick
> > I think you mean Lattice offers a part in the QFN32. I only found the > XO2-256. A selection guide that isn't even a year old says they offer > an iCE40 part in this package, but it doesn't show up in the data sheet > from 2013. I guess the product line is still a little schizo. They > haven't finished cleaning house and any part or package could be next.
The part code for this is ICE40LP384-SG32 Showing on price lists, but still 0 in the stock column. Mouser says 100 due on 9/30/2013
> In fact, that is an option, to add an MCU for > most of the I/O and processing, then use something like the XO2-256 in a > QFN32 to do the high speed stuff. I'm just not sure I can fit the > design in 256 LUTs. Maybe the QFN32 is small enough I can use two? 5x5mm!
Try it and see. I found the XO2-256 seems to pack full quite well, and the tools are ok to use, so you can find out quite quickly. I did a series of capture counters in XO2-256, and once it worked, I increased the the width to fill more of the part. IIRC it got into the 90%+ with now surprises. I've been meaning to compare the ICE40LP384 with the XO2-256, as the iCE40 cell is more primitive, it may not fit more. -jg
On Thursday, August 29, 2013 11:23:08 PM UTC+3, rickman wrote:
> On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote: >=20 > > On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote: >=20 > >> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote: >=20 > >> >=20 > >>> >=20 > >> >=20 > >>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (=
out of 18K memory bits only 2K bits used). It's hard to translate exactly i= nto old-fashioned LUTs, but I'd say - around 700.
>=20 > >> >=20 > >>> Per clock Nios2e is pretty slow, but it clocks rather high and it is =
a 32-bit CPU - very easy to program in C.
>=20 > >> >=20 > >> >=20 > >> >=20 > >> I can't say I fully understand the ALM, but I think it functions as a >=20 > >> lot more than just a pair of 4 input LUTs. It will do that without an=
y
>=20 > >> issue. But it will do a lot more and I expect this is used to great >=20 > >> advantage in a CPU. I'd say the ALM is equivalent to between 3 and 4 >=20 > >> LUT4s depending on the design. I guess it is hard to compare between >=20 > >> different device types. >=20 > >> >=20 > > >=20 > > No, ALM is close to two 4-input LUTs. May be, a bit more when implement=
ing complex tightly-coupled logic with high internal complexity to fanout r= atio. May be, a bit less, when implementing simple things with lots of regi= sters and high fanout.
>=20 > > >=20 > > For sake of the argument, I compiled Nios2e for Cyclone4, which has mor=
e old-fashioned architecture - 676 LCs + 2 M9Ks.
>=20 > > I also dug out my real-world design from many years ago that embeds Nio=
s2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks.
>=20 > > >=20 > > >=20 > >> >=20 > >> >=20 > >> >=20 > >> >=20 > >>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory f=
or fabric, could be an interesting exercise, well suitable for coding compe= tition. But, probably, illegal :(
>=20 > >> >=20 > >> Yes, there are always lots of tradeoffs to be considered. >=20 > >> >=20 > > >=20 > > My point is - if you don't need performance and can use embedded memori=
es then you can design useful 32-bit RISC CPU which would be non-trivially = smaller than 600 LCs.
>=20 > > Nios2e core that I took as example is small, but hardly minimalistic. I=
t implements full Nios2 architecture including several parts that you proba= bly don't need. In particular:
>=20 > > - everything related to interrupts and exceptions >=20 > > - support for big program address space >=20 > > - ability to run execute programs from any memories others than on-chip=
SRAM
>=20 >=20 >=20 > If the size of the NIOS2 is as small as you say,=20
Nios2e is small. And slow. Nios2s and Nios2f aren't small.
> then that only leaves=20 > two issues with using the NIOS2 in my FPGA designs. The first is that I=
=20
> don't need 32 bit data paths in addition to the large memory address=20 > bus. I assume this means the instructions are not so compact using more=
=20
> on chip memory than desired.
Yes, Nios2 code density is poor. About the same as MIPS32, may be, just a l= ittle bit better. Similar to PPC. Measurably worse than "old" ARM. More tha= n 1.5x worse than Thumb2.
>=20 >=20 >=20 > But the really big issue with using the NIOS2 is not technical, Altera=20 > won't let you use it on anything that isn't an Altera part. So in=20 > reality this is a non-starter no matter how good the NIOS2 is technically=
.
>=20
I don't understand why. If you code in C then porting non-hardware-specific parts of your code from= Nios2 to any other little-endian 32-bit processor with octet-addressable m= emory will take very little time. Much much less than porting hardware-spec= ific parts of code from, say, one ARM-Cortex SoC or MCU to another ARM-Cort= ex SoC or MCU. If you thought about it in advance, then even porting to big-endian 32-bitt= er is a non-issue, After all, we are talking about few KLOCs, at worst, few tens KLOCs. Unless= you code in asm, the CPU-related part of porting sounds as absolute non-is= sue. Esp. if you use gcc on both of your target. Or, may be, you wanted to say that Nios2 is unsuitable if your original des= ign not based on Altera FPGA? That's, of course, is true. But, then again, why would you *want* to use Nios2 outside of Altera realm?= Other vendors have their own 32-bit soft core solutions. I didn't try them= , but would think that in most aspects their solutions are similar to Nios2= . Or, as in case of Microsemi, they have licensing agreement with ARM which= make Cortex-M1 affordable for low volume products. In any case, unless the volumes are HUGE, "roll your own soft core" does no= t sound to me as a right use of developer's time. The only justification fo= r it that I can see about is personal enjoyment.
On 8/29/2013 6:27 PM, already5chosen@yahoo.com wrote:
> On Thursday, August 29, 2013 11:23:08 PM UTC+3, rickman wrote: >> On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote: >> >>> On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote: >> >>>> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote: >> >>>> >> >>>>> >> >>>> >> >>>>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700. >> >>>> >> >>>>> Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C. >> >>>> >> >>>> >> >>>> >> >>>> I can't say I fully understand the ALM, but I think it functions as a >> >>>> lot more than just a pair of 4 input LUTs. It will do that without any >> >>>> issue. But it will do a lot more and I expect this is used to great >> >>>> advantage in a CPU. I'd say the ALM is equivalent to between 3 and 4 >> >>>> LUT4s depending on the design. I guess it is hard to compare between >> >>>> different device types. >> >>>> >> >>> >> >>> No, ALM is close to two 4-input LUTs. May be, a bit more when implementing complex tightly-coupled logic with high internal complexity to fanout ratio. May be, a bit less, when implementing simple things with lots of registers and high fanout. >> >>> >> >>> For sake of the argument, I compiled Nios2e for Cyclone4, which has more old-fashioned architecture - 676 LCs + 2 M9Ks. >> >>> I also dug out my real-world design from many years ago that embeds Nios2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks. >> >>> >> >>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :( >> >>>> >> >>>> Yes, there are always lots of tradeoffs to be considered. >> >>>> >> >>> >> >>> My point is - if you don't need performance and can use embedded memories then you can design useful 32-bit RISC CPU which would be non-trivially smaller than 600 LCs. >> >>> Nios2e core that I took as example is small, but hardly minimalistic. It implements full Nios2 architecture including several parts that you probably don't need. In particular: >> >>> - everything related to interrupts and exceptions >> >>> - support for big program address space >> >>> - ability to run execute programs from any memories others than on-chip SRAM >> >> >> >> If the size of the NIOS2 is as small as you say, > > Nios2e is small. And slow. Nios2s and Nios2f aren't small.
Slow is a relative term. I expect NIOS is designed for the instruction set rather than for the implementation. From your description the s and f versions burn logic to get speed while the e version is the minimum hardware that can do the job. This is not my idea of how to make an embedded core. I would take the approach of designing a CPU which uses minimal resources as part of its architecture and uses an instruction set that is adequate and efficient rather than being optimized for a language. I am accustomed to writing assembly language code and even micro code for bit slice processors.
>> then that only leaves >> two issues with using the NIOS2 in my FPGA designs. The first is that I >> don't need 32 bit data paths in addition to the large memory address >> bus. I assume this means the instructions are not so compact using more >> on chip memory than desired. > > Yes, Nios2 code density is poor. About the same as MIPS32, may be, just a little bit better. Similar to PPC. Measurably worse than "old" ARM. More than 1.5x worse than Thumb2.
I can tell by the terms you use that you are thinking in terms of C programming and larger code bases than what I typically do. In particular the code for this job would be not far removed from the hardware and in fact would need to be written to work very efficiently with the hardware to meet the hard, real time constraints involved. This is not your typical C program.
>> But the really big issue with using the NIOS2 is not technical, Altera >> won't let you use it on anything that isn't an Altera part. So in >> reality this is a non-starter no matter how good the NIOS2 is technically. >> > > I don't understand why. > If you code in C then porting non-hardware-specific parts of your code from Nios2 to any other little-endian 32-bit processor with octet-addressable memory will take very little time. Much much less than porting hardware-specific parts of code from, say, one ARM-Cortex SoC or MCU to another ARM-Cortex SoC or MCU. > If you thought about it in advance, then even porting to big-endian 32-bitter is a non-issue,
Yes, you are thinking along very different lines than I am. The idea is not to port the code, but to port the processor. Then there is virtually no work involved other than recompiling the HDL.
> After all, we are talking about few KLOCs, at worst, few tens KLOCs. Unless you code in asm, the CPU-related part of porting sounds as absolute non-issue. Esp. if you use gcc on both of your target.
Probably not even a single KLOC, lol. All I am doing is replacing some hardware functions with software. Use the ALU and data paths of the CPU to replace the logic and data paths of dedicated hardware. Not tons of work but the timing is important. So once it is written and working and more importantly, verified, I want to never have to touch the code again, just as if it were hardware (well, gateware). So the processor would need to be ported to whatever device this is implemented in.
> Or, may be, you wanted to say that Nios2 is unsuitable if your original design not based on Altera FPGA? That's, of course, is true. > But, then again, why would you *want* to use Nios2 outside of Altera realm? Other vendors have their own 32-bit soft core solutions. I didn't try them, but would think that in most aspects their solutions are similar to Nios2. Or, as in case of Microsemi, they have licensing agreement with ARM which make Cortex-M1 affordable for low volume products. > > In any case, unless the volumes are HUGE, "roll your own soft core" does not sound to me as a right use of developer's time. The only justification for it that I can see about is personal enjoyment.
A CPU design can be as hard or as easy as you want. If you must have C support there is a ZPU which was designed explicitly for that, but I don't think this is a good match for deterministic real time apps. I have worked on a couple of versions of a stack based processor design which is reasonably efficient. I have some new ideas for something a bit more novel. We'll see what happens. This is all due to the EOL from Lattice and we have until November to get a last time buy in and a new design won't be needed until those parts are used. So I've likely got a year or so. -- Rick
On 8/29/2013 4:37 PM, jg wrote:
>> >> I think you mean Lattice offers a part in the QFN32. I only found the >> XO2-256. A selection guide that isn't even a year old says they offer >> an iCE40 part in this package, but it doesn't show up in the data sheet >> from 2013. I guess the product line is still a little schizo. They >> haven't finished cleaning house and any part or package could be next. > > The part code for this is ICE40LP384-SG32 > Showing on price lists, but still 0 in the stock column. > Mouser says 100 due on 9/30/2013
Just goes to show, you have to keep up on the data sheets. They just released a new one last week, 8/22/2013. This one includes the 32 pin QFN. Still, it is the poor step child of the family with no memory at all other than the FFs. Actually, I looked back through my history of data sheets and I must have had a brain cramp, they all show the QFN32. I have been looking at these parts for some time and I never realized they don't include distributed RAM using the LUTs. This part was not designed by Lattice, so I guess this may still be covered by patent. Lattice has a license on many Xilinx owned patents because they bought the Orca line from Lucent who had gotten all sorts of licensing from Xilinx in a weak moment. Not that this has hurt Xilinx much, but it is so out of character for them. I'll never understand why they licensed their products to Lucent. Maybe some huge customer required a second source for the 3000 and 4000 series. Or maybe it was just a huge wad of cash Lucent waved under their noses. Likely we'll never know. The point is I'm not nearly as enamored with the iCE40 parts as I was a year ago. They dropped the 600 LUT member of their family and replaced it with this 384 LUT member. At the same time they raised the quiescent current spec for the 1k part from 40 uA to 100 uAs. The entire iCE65 product line was dropped (which was even lower static current). They just can't seem to pick a direction and stick with it.
>> In fact, that is an option, to add an MCU for >> most of the I/O and processing, then use something like the XO2-256 in a >> QFN32 to do the high speed stuff. I'm just not sure I can fit the >> design in 256 LUTs. Maybe the QFN32 is small enough I can use two? 5x5mm! > > Try it and see. I found the XO2-256 seems to pack full quite well, and the tools are ok to use, so you can find out quite quickly. > I did a series of capture counters in XO2-256, and once it worked, I increased the the width to fill more of the part. > IIRC it got into the 90%+ with now surprises. > > I've been meaning to compare the ICE40LP384 with the XO2-256, as the iCE40 cell is more primitive, it may not fit more.
"Try it" is not so simple. The existing design is all logic. To "try it" requires repeating the design with a dichotomy of slow functions in software, fast functions in hardware and interfaces which will allow it all to function as a whole. It's not a huge project, but some of the functions (like a buffer size controlled FLL) might be a bit tricky to get right in software and may need to remain in gateware. Without block RAM this is hard. The beauty of doing it all in the FPGA is that the entire design can be run in one VHDL simulation. If the processor were integrated into the FPGA, then we are back to a single simulation, schweet! I'll more than likely go with one of the BGA packages, possibly the BGA256 because of the large ball spacing. This gives fairly relaxed design rules to the PCB. That then opens up the possibilities to a wide range of very capable parts. We'll see... -- Rick
rickman wrote:

> I have been looking at these parts for some time and I never > realized they don't include distributed RAM using the LUTs.
Also of note, the ICE40 Block RAM's two ports consist of one read-only port, and one write-only port; vs. the two independent read+write ports of many other FPGA families.
> Lattice has a license on many Xilinx owned patents because > they bought the Orca line from Lucent who had gotten all > sorts of licensing from Xilinx in a weak moment.
<snip>
> I'll never understand why they licensed their products to Lucent.
I'd reckon AT&T/Lucent had a large semiconductor patent portfolio with which to apply strategic "leverage" for a favorable cross-licensing agreement.
> If the processor were integrated into the FPGA, then we > are back to a single simulation, schweet!
As a yardstick, a system build for my homebrew RISC, including 4 Kbyte BRAM, UART and I/O, fits snugly into one of the 1280 LUT4 XO2 devices: : Number of logic LUT4s: 890 : Number of distributed RAM: 66 (132 LUT4s) : Number of ripple logic: 110 (220 LUT4s) : Number of shift registers: 0 : Total number of LUT4s: 1242 : : Number of block RAMs: 4 out of 7 (57%) The core proper (32 bit datapath, 16 bit instructions) is currently ~800 LUT4 in its' default configuration. [ I miss TBUF's when working on processor datapaths.] I don't have the XO2 design checked in, but the similar XP2 version is in the following code repository, under trunk/hdl/systems/evb_lattice_xp2_brevia : http://code.google.com/p/yard-1/ The above is still very much a work-in-progress, but far enough along to use for small assembly projects ( note that interrupts are currently broken ). -Brian
On 9/2/2013 9:56 PM, Brian Davis wrote:
> rickman wrote: > >> I have been looking at these parts for some time and I never >> realized they don't include distributed RAM using the LUTs. > > Also of note, the ICE40 Block RAM's two ports consist of > one read-only port, and one write-only port; vs. the two > independent read+write ports of many other FPGA families.
The iCE family of products have a number of shortcomings compared to the large parts sold elsewhere, but for a reason, the iCE lines are very, very low power. You can't do that if you have a lot of "fat" in the hardware. So they cut to the bone. This is not the only area where the parts are a little short. The question is how much does it matter? For a long time I've heard how brand X or A or whatever is better because of this feature or that feature. So the iCE line has few of these fancy features, how well do designs work in them?
>> Lattice has a license on many Xilinx owned patents because >> they bought the Orca line from Lucent who had gotten all >> sorts of licensing from Xilinx in a weak moment. > <snip> >> I'll never understand why they licensed their products to Lucent. > > I'd reckon AT&T/Lucent had a large semiconductor patent > portfolio with which to apply strategic "leverage" for a > favorable cross-licensing agreement.
Possible, but I don't think so. Any number of folks could have had semiconductor patents and no one else got anything like this. I would speculate that Xilinx needed a second source for some huge customer or maybe they were at a critical point in the company's growth and just needed a bunch of cash (as opposed to cache). Who knows?
>> If the processor were integrated into the FPGA, then we >> are back to a single simulation, schweet! > > As a yardstick, a system build for my homebrew RISC, > including 4 Kbyte BRAM, UART and I/O, fits snugly into > one of the 1280 LUT4 XO2 devices: > > : Number of logic LUT4s: 890 > : Number of distributed RAM: 66 (132 LUT4s) > : Number of ripple logic: 110 (220 LUT4s) > : Number of shift registers: 0 > : Total number of LUT4s: 1242 > : > : Number of block RAMs: 4 out of 7 (57%) > > The core proper (32 bit datapath, 16 bit instructions) > is currently ~800 LUT4 in its' default configuration. > [ I miss TBUF's when working on processor datapaths.] > > I don't have the XO2 design checked in, but the similar > XP2 version is in the following code repository, under > trunk/hdl/systems/evb_lattice_xp2_brevia : > > http://code.google.com/p/yard-1/ > > The above is still very much a work-in-progress, but > far enough along to use for small assembly projects > ( note that interrupts are currently broken ).
The trick to datapaths in CPU designs is to minimize the number of inputs onto a "bus" which is implemented as multiplexers. Minimizing inputs gains speed and minimizes logic. When possible put the muxes inside some RAM on the chip to good use. I got sidetracked on my last iteration of a CPU design which was going to use a block RAM as the "register file" and stack in one. Since then I've read about some other designs which use similar ideas although not identical. Why did you roll your own RISC design when each FPGA maker has their own? The Lattice version is even open source. -- Rick
rickman wrote:
> >>> I'll never understand why they licensed their products to Lucent. >> >> I'd reckon AT&T/Lucent had a large semiconductor patent >> portfolio with which to apply strategic "leverage" for a >> favorable cross-licensing agreement. > > Possible, but I don't think so. Any number of folks could > have had semiconductor patents and no one else got anything > like this. I would speculate that Xilinx needed a second source >
There was definitely a second source in the XC3000 days, first from MMI (bought by AMD), later AT&T; but I don't remember there being anyone second sourcing the XC4000 IIRC, as Xilinx introduced the XC4000, AT&T went their own way in the ORCA, with similar features (distributed RAM, carry chains), but using the Neocad software. My speculation is that at this juncture, AT&T leveraged rights to the Xilinx FPGA patents. Back in 1995, the AT&T press release responding to the Neocad acquisition was re-posted here: https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ and stated: " " When AT&T Microelectronics decided not to second source " the Xilinx 4000 family of FPGAs, we accelerated the " introduction of the ORCA family. " -----------------
> The trick to datapaths in CPU designs is to minimize > the number of inputs onto a "bus" which is implemented > as multiplexers.
Yes, that's why I miss the TBUF's :) In the XC4000/Virtex days, the same 32 bit core fit into 300-400 LUT4's, and a good number of TBUF's. The growth to ~800 LUT4 is split between the TBUF replacement muxes and new instruction set features.
> Why did you roll your own RISC design when each FPGA > maker has their own?
When the YARD core blinked it's first LED in 1999, there wasn't much in the way of free vendor RISC IP. Being a perpetually-unfinished spare-time project, I never got enough loose ends tidied up enough to make the sources available until recently.
> > The Lattice version is even open source. >
At the initial announcement, yes; but when I looked a couple years ago, the Lattice Mico source files had been lawyered up with a "Lattice Devices Only" clause, see the comments on this thread: http://latticeblogs.typepad.com/frontier/2006/08/open_source.html -Brian
On 9/3/2013 6:27 PM, Brian Davis wrote:
> rickman wrote: >> >>>> I'll never understand why they licensed their products to Lucent. >>> >>> I'd reckon AT&T/Lucent had a large semiconductor patent >>> portfolio with which to apply strategic "leverage" for a >>> favorable cross-licensing agreement. >> >> Possible, but I don't think so. Any number of folks could >> have had semiconductor patents and no one else got anything >> like this. I would speculate that Xilinx needed a second source >> > > There was definitely a second source in the XC3000 days, > first from MMI (bought by AMD), later AT&T; but I don't > remember there being anyone second sourcing the XC4000 > > IIRC, as Xilinx introduced the XC4000, AT&T went their > own way in the ORCA, with similar features (distributed RAM, > carry chains), but using the Neocad software. > > My speculation is that at this juncture, AT&T leveraged > rights to the Xilinx FPGA patents. > > Back in 1995, the AT&T press release responding to the > Neocad acquisition was re-posted here: > > https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ > > and stated: > " > " When AT&T Microelectronics decided not to second source > " the Xilinx 4000 family of FPGAs, we accelerated the > " introduction of the ORCA family. > "
Yes, that is what we are discussing. Why did *Xilinx* give out the family jewels to Lucent? We know it happened, the question is *why*?
> ----------------- > >> The trick to datapaths in CPU designs is to minimize >> the number of inputs onto a "bus" which is implemented >> as multiplexers. > > Yes, that's why I miss the TBUF's :) > > In the XC4000/Virtex days, the same 32 bit core fit into > 300-400 LUT4's, and a good number of TBUF's. > > The growth to ~800 LUT4 is split between the TBUF > replacement muxes and new instruction set features.
My understanding is that TBUFs may have been a good idea when LUT delays were 5 nS and routing was another 5 to 10 between LUTs, but as they made the devices more dense and faster they found the TBUFs just didn't scale in the same way, in fact the speed got worse! The capacitance being driven didn't go down much and the TBUFs needed to scale which means they had less drive. So they would have actually gotten slower. No, they are gone because TBUFs just aren't your friend when you want to make a dense, fast chip.
>> Why did you roll your own RISC design when each FPGA >> maker has their own? > > When the YARD core blinked it's first LED in 1999, > there wasn't much in the way of free vendor RISC IP. > > Being a perpetually-unfinished spare-time project, > I never got enough loose ends tidied up enough to > make the sources available until recently.
Ok, that makes sense. I rolled my first CPU around 2002 and, like you, it may have been used, but still is not finished.
>> The Lattice version is even open source. >> > At the initial announcement, yes; but when I looked > a couple years ago, the Lattice Mico source files > had been lawyered up with a "Lattice Devices Only" > clause, see the comments on this thread: > > http://latticeblogs.typepad.com/frontier/2006/08/open_source.html
Oh, that is a horse of a different color. So the Lattice CPU designs are out! No big loss. The 8 bitter doesn't have a C compiler (not that I care) and good CPU designs are a dime a dozen... I guess, depending on your definition of "good". -- Rick