comp.arch.fpga | Lattice Announces EOL for XP and EC/P Product Lines| page 7

Reply by rickman ●August 29, 20132013-08-29

On 8/26/2013 7:43 PM, jg wrote:
> On Saturday, August 24, 2013 1:06:37 PM UTC+12, rickman wrote:
>>
>>
>> I suppose the markets are different enough that FPGAs just can't be
>> produced economically in as wide a range of packaging.  I think that is
>> what Austin Leesa used to say, that it just costs too much to provide
>> the parts in a lot of packages.  Their real market is at the high end.
>> Like the big CPU makers, it is all about raising the ASP, Average
>> Selling Price.
>
> I was meaning more at the lower end, - eg where lattice can offer parts in QFN32, then take a large jump to Qfp100 0.5mm.
> QFN32 have only 21 io, so you easily exceed that, but there is a very large gap between the QFN32 and TQFP100.
>   They claim to be chasing the lower cost markets with these parts, but seem rather blinkered in doing so.
>   Altera well priced parts in gull wing,(MAX V) but only in a 0.4mm pitch.

I think you mean Lattice offers a part in the QFN32.  I only found the 
XO2-256.  A selection guide that isn't even a year old says they offer 
an iCE40 part in this package, but it doesn't show up in the data sheet 
from 2013.  I guess the product line is still a little schizo.  They 
haven't finished cleaning house and any part or package could be next.

>> As long as we are wishing for stuff, I'd really love to see a smallish
>> MCU mated to a smallish FPGA.
>
> If you push that 'smallish', the Cypress PSoC series have uC+logic.

That's a pretty hard "push".  I've looked at them but I don't get a warm 
fuzzy from a company that makes everything an uphill climb when they 
seem to think they are making it easy.  I've been looking at the PSOC 
parts since they were new.  At that time support was so crude, they had 
a weekly conference call and if you joined in you got a 1 on 1 training 
session.  That progressed through a long development aimed at making 
their parts push button and I am pretty sure that won't even come close 
to working for my needs.  I need a small FPGA, maybe 1000 LUTs to 
provide the high speed portion of the design.  I don't even need a 
"real" processor, I bet I could live a rich full life (in this design 
anyway) with an 8051.  In fact, that is an option, to add an MCU for 
most of the I/O and processing, then use something like the XO2-256 in a 
QFN32 to do the high speed stuff.  I'm just not sure I can fit the 
design in 256 LUTs.  Maybe the QFN32 is small enough I can use two?  5x5 
mm!

> The newest PSoc4 seems to have solved some of the sticker shock, but I think they crippled the Logic to achieve that.
>   Seems there is no free lunch.
>
> Cypress do however, grasp the package issue, and offer QFN40(0.5mm), as well as SSOP28(0.65mm)  and TQFP44(0.8mm).

Yeah, while the FPGA guys are rather phobic of issuing a lot of package 
combinations the MCU folks have tons of them.  They have a much tougher 
problem with all the combos of RAM, Flash, I/O count, clock speed, ... 
I can see why the FPGA people haven't embraced the idea of combining MCU 
with FPGA, it just doesn't fit their culture.

-- 

Rick

Reply by rickman ●August 29, 20132013-08-29

On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote:
> On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote:
>> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote:
>>
>>>
>>
>>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700.
>>
>>> Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C.
>>
>>
>>
>> I can't say I fully understand the ALM, but I think it functions as a
>> lot more than just a pair of 4 input LUTs.  It will do that without any
>> issue.  But it will do a lot more and I expect this is used to great
>> advantage in a CPU.  I'd say the ALM is equivalent to between 3 and 4
>> LUT4s depending on the design.  I guess it is hard to compare between
>> different device types.
>>
>
> No, ALM is close to two 4-input LUTs. May be, a bit more when implementing complex tightly-coupled logic with high internal complexity to fanout ratio. May be, a bit less, when implementing simple things with lots of registers and high fanout.
>
> For sake of the argument, I compiled Nios2e for Cyclone4, which has more old-fashioned architecture - 676 LCs + 2 M9Ks.
> I also dug out my real-world design from many years ago that embeds Nios2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks.
>
>
>>
>>
>>
>>
>>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :(
>>
>> Yes, there are always lots of  tradeoffs to be considered.
>>
>
> My point is - if you don't need performance and can use embedded memories then you can design useful 32-bit RISC CPU which would be non-trivially smaller than 600 LCs.
> Nios2e core that I took as example is small, but hardly minimalistic. It implements full Nios2 architecture including several parts that you probably don't need. In particular:
> - everything related to interrupts and exceptions
> - support for big program address space
> - ability to run execute programs from any memories others than on-chip SRAM

If the size of the NIOS2 is as small as you say, then that only leaves 
two issues with using the NIOS2 in my FPGA designs.  The first is that I 
don't need 32 bit data paths in addition to the large memory address 
bus.  I assume this means the instructions are not so compact using more 
on chip memory than desired.

But the really big issue with using the NIOS2 is not technical, Altera 
won't let you use it on anything that isn't an Altera part.  So in 
reality this is a non-starter no matter how good the NIOS2 is technically.

-- 

Rick

Reply by jg ●August 29, 20132013-08-29

> 
> I think you mean Lattice offers a part in the QFN32.  I only found the 
> XO2-256.  A selection guide that isn't even a year old says they offer 
> an iCE40 part in this package, but it doesn't show up in the data sheet 
> from 2013.  I guess the product line is still a little schizo.  They 
> haven't finished cleaning house and any part or package could be next.

The part code for this is ICE40LP384-SG32
Showing on price lists, but still 0 in the stock column.
Mouser says 100 due on 9/30/2013


> In fact, that is an option, to add an MCU for 
> most of the I/O and processing, then use something like the XO2-256 in a 
> QFN32 to do the high speed stuff.  I'm just not sure I can fit the 
> design in 256 LUTs.  Maybe the QFN32 is small enough I can use two?  5x5mm!

Try it and see. I found the XO2-256 seems to pack full quite well, and the tools are ok to use, so you can find out quite quickly.
 I did a series of capture counters in XO2-256, and once it worked, I increased the the width to fill more of the part. 
 IIRC it got into the 90%+ with now surprises.

 I've been meaning to compare the ICE40LP384 with the XO2-256, as the iCE40 cell is more primitive, it may not fit more. 

-jg

Reply by ●August 29, 20132013-08-29

On Thursday, August 29, 2013 11:23:08 PM UTC+3, rickman wrote:
> On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote:
>=20
> > On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote:
>=20
> >> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote:
>=20
> >>
>=20
> >>>
>=20
> >>
>=20
> >>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (=
out of 18K memory bits only 2K bits used). It's hard to translate exactly i=
nto old-fashioned LUTs, but I'd say - around 700.
>=20
> >>
>=20
> >>> Per clock Nios2e is pretty slow, but it clocks rather high and it is =
a 32-bit CPU - very easy to program in C.
>=20
> >>
>=20
> >>
>=20
> >>
>=20
> >> I can't say I fully understand the ALM, but I think it functions as a
>=20
> >> lot more than just a pair of 4 input LUTs.  It will do that without an=
y
>=20
> >> issue.  But it will do a lot more and I expect this is used to great
>=20
> >> advantage in a CPU.  I'd say the ALM is equivalent to between 3 and 4
>=20
> >> LUT4s depending on the design.  I guess it is hard to compare between
>=20
> >> different device types.
>=20
> >>
>=20
> >
>=20
> > No, ALM is close to two 4-input LUTs. May be, a bit more when implement=
ing complex tightly-coupled logic with high internal complexity to fanout r=
atio. May be, a bit less, when implementing simple things with lots of regi=
sters and high fanout.
>=20
> >
>=20
> > For sake of the argument, I compiled Nios2e for Cyclone4, which has mor=
e old-fashioned architecture - 676 LCs + 2 M9Ks.
>=20
> > I also dug out my real-world design from many years ago that embeds Nio=
s2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks.
>=20
> >
>=20
> >
>=20
> >>
>=20
> >>
>=20
> >>
>=20
> >>
>=20
> >>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory f=
or fabric, could be an interesting exercise, well suitable for coding compe=
tition. But, probably, illegal :(
>=20
> >>
>=20
> >> Yes, there are always lots of  tradeoffs to be considered.
>=20
> >>
>=20
> >
>=20
> > My point is - if you don't need performance and can use embedded memori=
es then you can design useful 32-bit RISC CPU which would be non-trivially =
smaller than 600 LCs.
>=20
> > Nios2e core that I took as example is small, but hardly minimalistic. I=
t implements full Nios2 architecture including several parts that you proba=
bly don't need. In particular:
>=20
> > - everything related to interrupts and exceptions
>=20
> > - support for big program address space
>=20
> > - ability to run execute programs from any memories others than on-chip=
 SRAM
>=20
>=20
>=20
> If the size of the NIOS2 is as small as you say,=20

Nios2e is small. And slow. Nios2s and Nios2f aren't small.

> then that only leaves=20
> two issues with using the NIOS2 in my FPGA designs.  The first is that I=
=20
> don't need 32 bit data paths in addition to the large memory address=20
> bus.  I assume this means the instructions are not so compact using more=
=20
> on chip memory than desired.

Yes, Nios2 code density is poor. About the same as MIPS32, may be, just a l=
ittle bit better. Similar to PPC. Measurably worse than "old" ARM. More tha=
n 1.5x worse than Thumb2.

>=20
>=20
>=20
> But the really big issue with using the NIOS2 is not technical, Altera=20
> won't let you use it on anything that isn't an Altera part.  So in=20
> reality this is a non-starter no matter how good the NIOS2 is technically=
.
>=20

I don't understand why.
If you code in C then porting non-hardware-specific parts of your code from=
 Nios2 to any other little-endian 32-bit processor with octet-addressable m=
emory will take very little time. Much much less than porting hardware-spec=
ific parts of code from, say, one ARM-Cortex SoC or MCU to another ARM-Cort=
ex SoC or MCU.
If you thought about it in advance, then even porting to big-endian 32-bitt=
er is a non-issue,
After all, we are talking about few KLOCs, at worst, few tens KLOCs. Unless=
 you code in asm, the CPU-related part of porting sounds as absolute non-is=
sue. Esp. if you use gcc on both of your target.

Or, may be, you wanted to say that Nios2 is unsuitable if your original des=
ign not based on Altera FPGA? That's, of course, is true.
But, then again, why would you *want* to use Nios2 outside of Altera realm?=
 Other vendors have their own 32-bit soft core solutions. I didn't try them=
, but would think that in most aspects their solutions are similar to Nios2=
. Or, as in case of Microsemi, they have licensing agreement with ARM which=
 make Cortex-M1 affordable for low volume products.

In any case, unless the volumes are HUGE, "roll your own soft core" does no=
t sound to me as a right use of developer's time. The only justification fo=
r it that I can see about is personal enjoyment.

Reply by rickman ●August 29, 20132013-08-29

On 8/29/2013 6:27 PM, already5chosen@yahoo.com wrote:
> On Thursday, August 29, 2013 11:23:08 PM UTC+3, rickman wrote:
>> On 8/28/2013 2:58 PM, already5chosen@yahoo.com wrote:
>>
>>> On Wednesday, August 28, 2013 11:51:36 AM UTC+3, rickman wrote:
>>
>>>> On 8/25/2013 12:44 PM, already5chosen@yahoo.com wrote:
>>
>>>>
>>
>>>>>
>>
>>>>
>>
>>>>> I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700.
>>
>>>>
>>
>>>>> Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C.
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> I can't say I fully understand the ALM, but I think it functions as a
>>
>>>> lot more than just a pair of 4 input LUTs.  It will do that without any
>>
>>>> issue.  But it will do a lot more and I expect this is used to great
>>
>>>> advantage in a CPU.  I'd say the ALM is equivalent to between 3 and 4
>>
>>>> LUT4s depending on the design.  I guess it is hard to compare between
>>
>>>> different device types.
>>
>>>>
>>
>>>
>>
>>> No, ALM is close to two 4-input LUTs. May be, a bit more when implementing complex tightly-coupled logic with high internal complexity to fanout ratio. May be, a bit less, when implementing simple things with lots of registers and high fanout.
>>
>>>
>>
>>> For sake of the argument, I compiled Nios2e for Cyclone4, which has more old-fashioned architecture - 676 LCs + 2 M9Ks.
>>
>>> I also dug out my real-world design from many years ago that embeds Nios2e into Cyclone2. It is even smaller at 565 LCs + 2 M4Ks.
>>
>>>
>>
>>>
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>>> Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :(
>>
>>>>
>>
>>>> Yes, there are always lots of  tradeoffs to be considered.
>>
>>>>
>>
>>>
>>
>>> My point is - if you don't need performance and can use embedded memories then you can design useful 32-bit RISC CPU which would be non-trivially smaller than 600 LCs.
>>
>>> Nios2e core that I took as example is small, but hardly minimalistic. It implements full Nios2 architecture including several parts that you probably don't need. In particular:
>>
>>> - everything related to interrupts and exceptions
>>
>>> - support for big program address space
>>
>>> - ability to run execute programs from any memories others than on-chip SRAM
>>
>>
>>
>> If the size of the NIOS2 is as small as you say,
>
> Nios2e is small. And slow. Nios2s and Nios2f aren't small.

Slow is a relative term.  I expect NIOS is designed for the instruction 
set rather than for the implementation.  From your description the s and 
f versions burn logic to get speed while the e version is the minimum 
hardware that can do the job.  This is not my idea of how to make an 
embedded core.

I would take the approach of designing a CPU which uses minimal 
resources as part of its architecture and uses an instruction set that 
is adequate and efficient rather than being optimized for a language.  I 
am accustomed to writing assembly language code and even micro code for 
bit slice processors.

>> then that only leaves
>> two issues with using the NIOS2 in my FPGA designs.  The first is that I
>> don't need 32 bit data paths in addition to the large memory address
>> bus.  I assume this means the instructions are not so compact using more
>> on chip memory than desired.
>
> Yes, Nios2 code density is poor. About the same as MIPS32, may be, just a little bit better. Similar to PPC. Measurably worse than "old" ARM. More than 1.5x worse than Thumb2.

I can tell by the terms you use that you are thinking in terms of C 
programming and larger code bases than what I typically do.  In 
particular the code for this job would be not far removed from the 
hardware and in fact would need to be written to work very efficiently 
with the hardware to meet the hard, real time constraints involved. 
This is not your typical C program.

>> But the really big issue with using the NIOS2 is not technical, Altera
>> won't let you use it on anything that isn't an Altera part.  So in
>> reality this is a non-starter no matter how good the NIOS2 is technically.
>>
>
> I don't understand why.
> If you code in C then porting non-hardware-specific parts of your code from Nios2 to any other little-endian 32-bit processor with octet-addressable memory will take very little time. Much much less than porting hardware-specific parts of code from, say, one ARM-Cortex SoC or MCU to another ARM-Cortex SoC or MCU.
> If you thought about it in advance, then even porting to big-endian 32-bitter is a non-issue,

Yes, you are thinking along very different lines than I am.  The idea is 
not to port the code, but to port the processor.  Then there is 
virtually no work involved other than recompiling the HDL.

> After all, we are talking about few KLOCs, at worst, few tens KLOCs. Unless you code in asm, the CPU-related part of porting sounds as absolute non-issue. Esp. if you use gcc on both of your target.

Probably not even a single KLOC, lol.  All I am doing is replacing some 
hardware functions with software.  Use the ALU and data paths of the CPU 
to replace the logic and data paths of dedicated hardware.  Not tons of 
work but the timing is important.  So once it is written and working and 
more importantly, verified, I want to never have to touch the code 
again, just as if it were hardware (well, gateware).  So the processor 
would need to be ported to whatever device this is implemented in.

> Or, may be, you wanted to say that Nios2 is unsuitable if your original design not based on Altera FPGA? That's, of course, is true.
> But, then again, why would you *want* to use Nios2 outside of Altera realm? Other vendors have their own 32-bit soft core solutions. I didn't try them, but would think that in most aspects their solutions are similar to Nios2. Or, as in case of Microsemi, they have licensing agreement with ARM which make Cortex-M1 affordable for low volume products.
>
> In any case, unless the volumes are HUGE, "roll your own soft core" does not sound to me as a right use of developer's time. The only justification for it that I can see about is personal enjoyment.

A CPU design can be as hard or as easy as you want.  If you must have C 
support there is a ZPU which was designed explicitly for that, but I 
don't think this is a good match for deterministic real time apps.  I 
have worked on a couple of versions of a stack based processor design 
which is reasonably efficient.  I have some new ideas for something a 
bit more novel.  We'll see what happens.  This is all due to the EOL 
from Lattice and we have until November to get a last time buy in and a 
new design won't be needed until those parts are used.  So I've likely 
got a year or so.

-- 

Rick

Reply by rickman ●August 30, 20132013-08-30

On 8/29/2013 4:37 PM, jg wrote:
>>
>> I think you mean Lattice offers a part in the QFN32.  I only found the
>> XO2-256.  A selection guide that isn't even a year old says they offer
>> an iCE40 part in this package, but it doesn't show up in the data sheet
>> from 2013.  I guess the product line is still a little schizo.  They
>> haven't finished cleaning house and any part or package could be next.
>
> The part code for this is ICE40LP384-SG32
> Showing on price lists, but still 0 in the stock column.
> Mouser says 100 due on 9/30/2013

Just goes to show, you have to keep up on the data sheets.  They just 
released a new one last week, 8/22/2013.  This one includes the 32 pin 
QFN.  Still, it is the poor step child of the family with no memory at 
all other than the FFs.  Actually, I looked back through my history of 
data sheets and I must have had a brain cramp, they all show the QFN32.

I have been looking at these parts for some time and I never realized 
they don't include distributed RAM using the LUTs.  This part was not 
designed by Lattice, so I guess this may still be covered by patent. 
Lattice has a license on many Xilinx owned patents because they bought 
the Orca line from Lucent who had gotten all sorts of licensing from 
Xilinx in a weak moment.  Not that this has hurt Xilinx much, but it is 
so out of character for them.  I'll never understand why they licensed 
their products to Lucent.  Maybe some huge customer required a second 
source for the 3000 and 4000 series.  Or maybe it was just a huge wad of 
cash Lucent waved under their noses.  Likely we'll never know.

The point is I'm not nearly as enamored with the iCE40 parts as I was a 
year ago.  They dropped the 600 LUT member of their family and replaced 
it with this 384 LUT member.  At the same time they raised the quiescent 
current spec for the 1k part from 40 uA to 100 uAs.  The entire iCE65 
product line was dropped (which was even lower static current).  They 
just can't seem to pick a direction and stick with it.

>> In fact, that is an option, to add an MCU for
>> most of the I/O and processing, then use something like the XO2-256 in a
>> QFN32 to do the high speed stuff.  I'm just not sure I can fit the
>> design in 256 LUTs.  Maybe the QFN32 is small enough I can use two?  5x5mm!
>
> Try it and see. I found the XO2-256 seems to pack full quite well, and the tools are ok to use, so you can find out quite quickly.
>   I did a series of capture counters in XO2-256, and once it worked, I increased the the width to fill more of the part.
>   IIRC it got into the 90%+ with now surprises.
>
>   I've been meaning to compare the ICE40LP384 with the XO2-256, as the iCE40 cell is more primitive, it may not fit more.

"Try it" is not so simple.  The existing design is all logic.  To "try 
it" requires repeating the design with a dichotomy of slow functions in 
software, fast functions in hardware and interfaces which will allow it 
all to function as a whole.  It's not a huge project, but some of the 
functions (like a buffer size controlled FLL) might be a bit tricky to 
get right in software and may need to remain in gateware.  Without block 
RAM this is hard.  The beauty of doing it all in the FPGA is that the 
entire design can be run in one VHDL simulation.  If the processor were 
integrated into the FPGA, then we are back to a single simulation, schweet!

I'll more than likely go with one of the BGA packages, possibly the 
BGA256 because of the large ball spacing.  This gives fairly relaxed 
design rules to the PCB.  That then opens up the possibilities to a wide 
range of very capable parts.  We'll see...

-- 

Rick

Reply by Brian Davis ●September 2, 20132013-09-02

rickman wrote:

> I have been looking at these parts for some time and I never 
> realized they don't include distributed RAM using the LUTs.  

Also of note, the ICE40 Block RAM's two ports consist of
one read-only port, and one write-only port; vs. the two
independent read+write ports of many other FPGA families.

> Lattice has a license on many Xilinx owned patents because 
> they bought the Orca line from Lucent who had gotten all 
> sorts of licensing from Xilinx in a weak moment. 
<snip>
> I'll never understand why they licensed their products to Lucent. 

 I'd reckon AT&T/Lucent had a large semiconductor patent 
portfolio with which to apply strategic "leverage" for a 
favorable cross-licensing agreement.

> If the processor were integrated into the FPGA, then we 
> are back to a single simulation, schweet! 

As a yardstick, a system build for my homebrew RISC,
including 4 Kbyte BRAM, UART and I/O, fits snugly into 
one of the 1280 LUT4 XO2 devices:

:   Number of logic LUT4s:      890
:   Number of distributed RAM:   66 (132 LUT4s)
:   Number of ripple logic:     110 (220 LUT4s)
:   Number of shift registers:    0
:   Total number of LUT4s:     1242
:
:   Number of block RAMs:  4 out of 7 (57%)

 The core proper (32 bit datapath, 16 bit instructions)
is currently ~800 LUT4 in its' default configuration.
[ I miss TBUF's when working on processor datapaths.]

I don't have the XO2 design checked in, but the similar
XP2 version is in the following code repository, under 
trunk/hdl/systems/evb_lattice_xp2_brevia :

 http://code.google.com/p/yard-1/

The above is still very much a work-in-progress, but 
far enough along to use for small assembly projects 
( note that interrupts are currently broken ).

-Brian

Reply by rickman ●September 3, 20132013-09-03

On 9/2/2013 9:56 PM, Brian Davis wrote:
> rickman wrote:
>
>> I have been looking at these parts for some time and I never
>> realized they don't include distributed RAM using the LUTs.
>
> Also of note, the ICE40 Block RAM's two ports consist of
> one read-only port, and one write-only port; vs. the two
> independent read+write ports of many other FPGA families.

The iCE family of products have a number of shortcomings compared to the 
large parts sold elsewhere, but for a reason, the iCE lines are very, 
very low power.  You can't do that if you have a lot of "fat" in the 
hardware.  So they cut to the bone.  This is not the only area where the 
parts are a little short.  The question is how much does it matter?  For 
a long time I've heard how brand X or A or whatever is better because of 
this feature or that feature.  So the iCE line has few of these fancy 
features, how well do designs work in them?

>> Lattice has a license on many Xilinx owned patents because
>> they bought the Orca line from Lucent who had gotten all
>> sorts of licensing from Xilinx in a weak moment.
> <snip>
>> I'll never understand why they licensed their products to Lucent.
>
>   I'd reckon AT&T/Lucent had a large semiconductor patent
> portfolio with which to apply strategic "leverage" for a
> favorable cross-licensing agreement.

Possible, but I don't think so.  Any number of folks could have had 
semiconductor patents and no one else got anything like this.  I would 
speculate that Xilinx needed a second source for some huge customer or 
maybe they were at a critical point in the company's growth and just 
needed a bunch of cash (as opposed to cache).  Who knows?

>> If the processor were integrated into the FPGA, then we
>> are back to a single simulation, schweet!
>
> As a yardstick, a system build for my homebrew RISC,
> including 4 Kbyte BRAM, UART and I/O, fits snugly into
> one of the 1280 LUT4 XO2 devices:
>
> :   Number of logic LUT4s:      890
> :   Number of distributed RAM:   66 (132 LUT4s)
> :   Number of ripple logic:     110 (220 LUT4s)
> :   Number of shift registers:    0
> :   Total number of LUT4s:     1242
> :
> :   Number of block RAMs:  4 out of 7 (57%)
>
>   The core proper (32 bit datapath, 16 bit instructions)
> is currently ~800 LUT4 in its' default configuration.
> [ I miss TBUF's when working on processor datapaths.]
>
> I don't have the XO2 design checked in, but the similar
> XP2 version is in the following code repository, under
> trunk/hdl/systems/evb_lattice_xp2_brevia :
>
>   http://code.google.com/p/yard-1/
>
> The above is still very much a work-in-progress, but
> far enough along to use for small assembly projects
> ( note that interrupts are currently broken ).

The trick to datapaths in CPU designs is to minimize the number of 
inputs onto a "bus" which is implemented as multiplexers.  Minimizing 
inputs gains speed and minimizes logic.  When possible put the muxes 
inside some RAM on the chip to good use.  I got sidetracked on my last 
iteration of a CPU design which was going to use a block RAM as the 
"register file" and stack in one.  Since then I've read about some other 
designs which use similar ideas although not identical.

Why did you roll your own RISC design when each FPGA maker has their 
own?  The Lattice version is even open source.

-- 

Rick

Reply by Brian Davis ●September 3, 20132013-09-03

rickman wrote:
>
>>> I'll never understand why they licensed their products to Lucent.
>>
>>   I'd reckon AT&T/Lucent had a large semiconductor patent
>> portfolio with which to apply strategic "leverage" for a
>> favorable cross-licensing agreement.
>
> Possible, but I don't think so.  Any number of folks could
> have had semiconductor patents and no one else got anything 
> like this. I would speculate that Xilinx needed a second source
>

 There was definitely a second source in the XC3000 days,
first from MMI (bought by AMD), later AT&T; but I don't 
remember there being anyone second sourcing the XC4000

 IIRC, as Xilinx introduced the XC4000, AT&T went their 
own way in the ORCA, with similar features (distributed RAM, 
carry chains), but using the Neocad software.

 My speculation is that at this juncture, AT&T leveraged
rights to the Xilinx FPGA patents.

 Back in 1995, the AT&T press release responding to the 
Neocad acquisition was re-posted here:

https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ

and stated:
"
" When AT&T Microelectronics decided not to second source 
" the Xilinx 4000 family of FPGAs, we accelerated the 
" introduction of the ORCA family.
"

-----------------

> The trick to datapaths in CPU designs is to minimize 
> the number of inputs onto a "bus" which is implemented 
> as multiplexers.  

Yes, that's why I miss the TBUF's :)

In the XC4000/Virtex days, the same 32 bit core fit into 
300-400 LUT4's, and a good number of TBUF's.

 The growth to ~800 LUT4 is split between the TBUF 
replacement muxes and new instruction set features.

> Why did you roll your own RISC design when each FPGA 
> maker has their own?

 When the YARD core blinked it's first LED in 1999, 
there wasn't much in the way of free vendor RISC IP.

 Being a perpetually-unfinished spare-time project, 
I never got enough loose ends tidied up enough to 
make the sources available until recently.

>
> The Lattice version is even open source. 
>
At the initial announcement, yes; but when I looked 
a couple years ago, the Lattice Mico source files 
had been lawyered up with a "Lattice Devices Only" 
clause, see the comments on this thread:

http://latticeblogs.typepad.com/frontier/2006/08/open_source.html

-Brian

Reply by rickman ●September 3, 20132013-09-03

On 9/3/2013 6:27 PM, Brian Davis wrote:
> rickman wrote:
>>
>>>> I'll never understand why they licensed their products to Lucent.
>>>
>>>    I'd reckon AT&T/Lucent had a large semiconductor patent
>>> portfolio with which to apply strategic "leverage" for a
>>> favorable cross-licensing agreement.
>>
>> Possible, but I don't think so.  Any number of folks could
>> have had semiconductor patents and no one else got anything
>> like this. I would speculate that Xilinx needed a second source
>>
>
>   There was definitely a second source in the XC3000 days,
> first from MMI (bought by AMD), later AT&T; but I don't
> remember there being anyone second sourcing the XC4000
>
>   IIRC, as Xilinx introduced the XC4000, AT&T went their
> own way in the ORCA, with similar features (distributed RAM,
> carry chains), but using the Neocad software.
>
>   My speculation is that at this juncture, AT&T leveraged
> rights to the Xilinx FPGA patents.
>
>   Back in 1995, the AT&T press release responding to the
> Neocad acquisition was re-posted here:
>
> https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ
>
> and stated:
> "
> " When AT&T Microelectronics decided not to second source
> " the Xilinx 4000 family of FPGAs, we accelerated the
> " introduction of the ORCA family.
> "

Yes, that is what we are discussing.  Why did *Xilinx* give out the 
family jewels to Lucent?  We know it happened, the question is *why*?


> -----------------
>
>> The trick to datapaths in CPU designs is to minimize
>> the number of inputs onto a "bus" which is implemented
>> as multiplexers.
>
> Yes, that's why I miss the TBUF's :)
>
> In the XC4000/Virtex days, the same 32 bit core fit into
> 300-400 LUT4's, and a good number of TBUF's.
>
>   The growth to ~800 LUT4 is split between the TBUF
> replacement muxes and new instruction set features.

My understanding is that TBUFs may have been a good idea when LUT delays 
were 5 nS and routing was another 5 to 10 between LUTs, but as they made 
the devices more dense and faster they found the TBUFs just didn't scale 
in the same way, in fact the speed got worse!  The capacitance being 
driven didn't go down much and the TBUFs needed to scale which means 
they had less drive.  So they would have actually gotten slower.  No, 
they are gone because TBUFs just aren't your friend when you want to 
make a dense, fast chip.


>> Why did you roll your own RISC design when each FPGA
>> maker has their own?
>
>   When the YARD core blinked it's first LED in 1999,
> there wasn't much in the way of free vendor RISC IP.
>
>   Being a perpetually-unfinished spare-time project,
> I never got enough loose ends tidied up enough to
> make the sources available until recently.

Ok, that makes sense.  I rolled my first CPU around 2002 and, like you, 
it may have been used, but still is not finished.


>> The Lattice version is even open source.
>>
> At the initial announcement, yes; but when I looked
> a couple years ago, the Lattice Mico source files
> had been lawyered up with a "Lattice Devices Only"
> clause, see the comments on this thread:
>
> http://latticeblogs.typepad.com/frontier/2006/08/open_source.html

Oh, that is a horse of a different color.  So the Lattice CPU designs 
are out!  No big loss.  The 8 bitter doesn't have a C compiler (not that 
I care) and good CPU designs are a dime a dozen... I guess, depending on 
your definition of "good".

-- 

Rick

Previous 5 678 Next

Lattice Announces EOL for XP and EC/P Product Lines

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group