comp.arch.fpga | high bandwitch ethernet communication| page 2

Reply by Paul Keinanen ●September 6, 20072007-09-06

On Wed, 05 Sep 2007 18:06:16 -0700, Janaka <janakas@optiscan.com>
wrote:

>We've are using the MPC8349E at 400Mhz core and got only 480mbit/s
>sustained UDP data rate.  These processors are marketed as
>communications processors but only have low level HW support (on
>Ethernet layer).  All the upper level IP and UDP protocols are handled
>in software (when running linux).  So it takes up CPU time.  Same
>setup on two desktop PCs running linux yeilds 840mbit/s sustained UDP
>rate.

If the OP required only something dedicated point to point
connectivity, why bother with the IP wrapper, just send raw Ethernet
frames with MAC addressing ?

Apparently that MPC has some modern version of the QUICC co-processor
(as found on the MC68360), in which it is quite easy to set up one BD
(buffer descriptor) for the (possibly fixed) header and an other for
the actual data. The co-processor assembles the frames from the
fragments, sends them autonomously, appends the CRC and then search
for next ready frame to be sent, without any further main processor
intervention. 

The hard thing is to get the transmit data into the transmit buffers
fast enough, but for direct port to port copying, there should not be
much need to move the actual data in the memory.

Paul

Reply by Hal Murray ●September 6, 20072007-09-06

I suggest that you go back and read John McCaskill's response.
It would probably help to discuss things with a network wizard.
You want a low level protocol geek, not a web designer.
(especially one who knows something about hardware)

I think the real question is what happens when a packet
gets lost?  If you are using TCP, you have to buffer
all the data until it gets ACKed.  If you are using UDP,
you drop some data.

UPD in send-only mode doesn't really require a stack.


>So where is the actual UDP communication implemented ? In Linux ? What
>processor is it running on ? Is it an external CPU or the built-in PPC
>of Virtex 4 ?

If I was doing this (or something like what I think you
are doing), I would try to do all the UDP in the FPGA.
The header is just a bunch of constants.  You probably want
a sequence number in your payload.  Then you have to compute the
CRC.  The whole thing is well specified and you can be sure
it will run fast enough as long as the network doesn't get
congested.  No ACKs, just fire and forget.

An alternative approach is to get the data into a PC
somehow, and do the UDP/whatever work from that PC.

As somebody else already suggested, one "easy" way to get
the data into a PC would be to use Ethernet on a point to
point link.  You don't even need a CRC.  This is easy for the
hardware.  The software guys might not like it.  They have
to steal the ethernet port from the software stack and
write a driver.  You might look at tcpdump and see how
it handles packets with CRC errors.

There are various PCI boards with an FPGA on them.
If you can get your data on to one of those cards, then you
can DMA it into memory.  That still needs software but it
is a slightly different type of software.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Reply by eliben ●September 6, 20072007-09-06

On Sep 6, 9:52 am, Paul Keinanen <keina...@sci.fi> wrote:
> On Wed, 05 Sep 2007 18:06:16 -0700, Janaka <jana...@optiscan.com>
> wrote:
>
> >We've are using the MPC8349E at 400Mhz core and got only 480mbit/s
> >sustained UDP data rate.  These processors are marketed as
> >communications processors but only have low level HW support (on
> >Ethernet layer).  All the upper level IP and UDP protocols are handled
> >in software (when running linux).  So it takes up CPU time.  Same
> >setup on two desktop PCs running linux yeilds 840mbit/s sustained UDP
> >rate.
>
> If the OP required only something dedicated point to point
> connectivity, why bother with the IP wrapper, just send raw Ethernet
> frames with MAC addressing ?
>

I wondered about that, actually. But working on the MAC level is very
inflexible. For example:

1) What if the client computer gets replaced by an equivalent
computer. Each NIC has a unique MAC address, and so I'll have to
reconfigure my sender, or set up some manual MAC discovery protocol.

2) If the client is a PC of some sort, working on the MAC packet level
isn't too simple, as the networking APIs don't provide that level. A
separate driver to the NIC should be used, or whatever.

3) If I want to advance to a more complicated network, such as one
with a few clients, working on the IP level is much more convenient as
I can set up a router with all the niceties it brings - multicasts,
groups, etc.

Eli

Reply by Paul Keinanen ●September 6, 20072007-09-06

On Thu, 06 Sep 2007 11:19:09 -0000, eliben <eliben@gmail.com> wrote:

>On Sep 6, 9:52 am, Paul Keinanen <keina...@sci.fi> wrote:

>> If the OP required only something dedicated point to point
>> connectivity, why bother with the IP wrapper, just send raw Ethernet
>> frames with MAC addressing ?
>>
>
>I wondered about that, actually. But working on the MAC level is very
>inflexible. For example:
>
>1) What if the client computer gets replaced by an equivalent
>computer. Each NIC has a unique MAC address, and so I'll have to
>reconfigure my sender, or set up some manual MAC discovery protocol.

The "manual MAC discovery protocol" could be ARP, which is simple to
implement (manually creating the request IP header) and you get the
MAC address of the other partner. After that, you do not have to
bother about any IP addresses in the message headers in the actual
high speed data transfers. Only if you send the data to some hot
standby redundant system, in which the MAC address can change at any
time, but again, you just would have to repeat the ARP protocol query.

>2) If the client is a PC of some sort, working on the MAC packet level
>isn't too simple, as the networking APIs don't provide that level. A
>separate driver to the NIC should be used, or whatever.

I haven't written any raw Ethernet protocols in two decades, but in
those days setting the receiver into Promiscuous mode was all that was
needed. 

I still assume that the current Ethernet card support the Promiscuous
mode, since there are a lot of Ethernet and TCP/UDP/IP analysing
programs working with standard Ethernet adapters. Are these analysing
programs using some dedicated driver stacks ?

With the cost of the system that the OP asked, there would not be a
cost issue of installing an extra network cards on the receiving PC.
Thus, one NIC could handle the fast traffic in Promiscuous mode, while
the other NIC(s) could handle ordinary network traffic.

>3) If I want to advance to a more complicated network, such as one
>with a few clients, working on the IP level is much more convenient as
>I can set up a router with all the niceties it brings - multicasts,
>groups, etc.

MAC broadcasts work well with hubs. This kind of MAC broadcast is used
in some producer/consumer model Ethernet based industrial networks
these days.

Paul

Reply by Brian Drummond ●September 6, 20072007-09-06

On Thu, 06 Sep 2007 05:20:14 -0000, eliben <eliben@gmail.com> wrote:

>> My first choice would be some other, more embeddable, processor running
>> off to the side.  A PowerPC from Freescale, or an ARM processor from just
>> about anybody, comes to mind.  I suspect that even a modest such processor
>> would get some pretty high speeds if that's all it was doing.
>>
>> You may have to bite the bullet and write your own stack that's shared
>> between a processor and the FPGA.  I know practically nothing about
>> TCP/IP, but I'm willing to bet that once you've signed up to writing
>> or modifying your own stack there are some obvious things to put into the
>> FPGA to speed things up.
>
>Thanks, this is the design I'm now leaning towards. However, I have a
>concern regarding the high-speed communication between the FPGA and
>the outside Processor. How is it done - using some high speed outside
>DDR memory ?

If the FPGA was a Virtex-IIPro, or V4FX or so, the PowerPC wouldn't be
external.

Mind you, after designing logic, anything running on the PowerPC seems
painfully slow...

- Brian

Reply by Grant Edwards ●September 6, 20072007-09-06

On 2007-09-06, eliben <eliben@gmail.com> wrote:

> 2) If the client is a PC of some sort, working on the MAC packet level
>    isn't too simple,

It's easy:

 $ man packet

> as the networking APIs don't provide that level.

Sure they do.  See above.

Of course it sucks trying to do it under Windows, but it sucks
trying to do _anything_ under Windows. ;)

> A separate driver to the NIC should be used, or whatever.
>
> 3) If I want to advance to a more complicated network, such as one
> with a few clients, working on the IP level is much more convenient as
> I can set up a router with all the niceties it brings - multicasts,
> groups, etc.

Yup.  One of the products I work on started out with MAC level
networking.  It's fast and has very low overhead, but there are
always going to be customers who want IP networking.  So now
the product will do either MAC networking or TCP networking (or
both, actually).

Beware of relying on the Ethernet CRC.  I've run across two
different uController/MAC combinations where it wasn't
reliable.

-- 
Grant Edwards                   grante             Yow! Wait ... is this a FUN
                                  at               THING or the END of LIFE in
                               visi.com            Petticoat Junction??

Reply by Grant Edwards ●September 6, 20072007-09-06

On 2007-09-06, Paul Keinanen <keinanen@sci.fi> wrote:

>>2) If the client is a PC of some sort, working on the MAC packet level
>>isn't too simple, as the networking APIs don't provide that level. A
>>separate driver to the NIC should be used, or whatever.
>
> I haven't written any raw Ethernet protocols in two decades, but in
> those days setting the receiver into Promiscuous mode was all that was
> needed. 

There's no need for promiscuous mode.  None of the MAC packet
level products I've worked on used promiscuous mode at all.

> With the cost of the system that the OP asked, there would not be a
> cost issue of installing an extra network cards on the receiving PC.
> Thus, one NIC could handle the fast traffic in Promiscuous mode, while
> the other NIC(s) could handle ordinary network traffic.

I don't see what promiscuous mode has to do with it.  The MAC
level protocols I worked with were all still unicast.

-- 
Grant Edwards                   grante             Yow! FOOLED you!  Absorb
                                  at               EGO SHATTERING impulse
                               visi.com            rays, polyester poltroon!!

Reply by John McCaskill ●September 6, 20072007-09-06

On Sep 6, 12:22 am, eliben <eli...@gmail.com> wrote:
> > If you have the choice between UDP and TCP, UDP is much simpler and
> > fits an FPGA well. The big issue in choosing between the two is if you
> > require the guaranteed delivery of TCP, or can tollerate the potential
> > packet loss of UDP.
>
> > As an example, we make a card that acquires real time data in a custom
> > protocol that is wrapped in UDP. We use a Xilinx Virtex-4 FX60, and a
> > protocol offload engine that uses the Xilinx PicoBlaze soft processor
> > to deal with the protocol stack.  The PicoBlaze is an 8-bit soft
> > processor.  It looks at each incomming packet and reads the header to
> > see if it is one of the real time streams we are trying to offload.
> > If it is, it sends the header to one circular buffer in memory and the
> > data to another circular buffer.  If it is not, it sends it to a
> > kernel buffer and we let the Linux network stack deal with it.
>
> > With this setup, we can consume data at over 90 MB/sec per Gigabit
> > Ethernet port.  The data part of the packet is 1024 bytes, and each
> > GigE port has its own PicoBlaze dedicated to it.
>
> So where is the actual UDP communication implemented ? In Linux ? What
> processor is it running on ? Is it an external CPU or the built-in PPC
> of Virtex 4 ?
>
>

We are using one of the embedded PowerPCs to run Linux, and one
PicoBlaze soft processor per EMAC in the design.

As each packet exits the EMAC, its header is examined by software that
is running on the PicoBlaze.  The PicoBlaze is running a very simple
stack written in assembly language.  As it looks at each layer of the
header, it makes a decision to do one of several things. At the
Ethernet level, it is deciding if it should just throw the packet
away, or pass it on to the next layer. At the IP and UDP layers, it is
deciding if the packet belongs to the protocol that we are offloading,
and is a stream that we have requested.  If the packet does belong to
the protocol that we are offloading, the PicoBlaze sets up a PLB DMA
engine to send it down one data path. If the packet does not belong to
the protocol we are offloading, then the PicoBlaze sends it down
another data path and it is given to the Linux kernel to deal with.

So for the protocol that we are offloading, the entire Ethernet, IP,
UDP, and custom protocol stack are implemented in PicoBlaze assembly
code.  For everything else, the stack is in Linux.  Since the data we
are offloading is multicast data, this lets us have the PicoBlaze deal
with the simple but high speed UDP packets, and the PowerPC running
Linux deal with the IGMP messaging required to join, leave and
maintain membership in a multicast group.

The PicoBlaze is just running a single threaded loop of code that
polls for input or output data, and then deals with it.  Its program
is loaded from the PowerPC taking advantage of the fact that the BRAM
holding its program is dual ported.  The PowerPC also tells the
PicoBlaze what streams of data it is supposed to be acquiring, and
what buffers in DDR2 memory to write the data to.  The PicoBlaze will
generate an interrupt once it has written a certain number of packets
to the buffer, so the PowerPC just sees big chunks of data showing up
in the buffers and does not have to deal with each packet.  The
PowerPC only needs to spend 1 or 2 percent of its compute to deal with
acquiring each stream of data.

>
> > I did notice that you want to send GigE instead of receive it like we
> > are doing, but this method should work for sending a custom protocol
> > wrapped in UDP with some minor changes.
>
> > How is the GigE that you are sending the data over connected? Is it
> > point to point, a dedicated network, or something else?
>
> We can assume for the sake of discussion that it is point to point,
> since the network is small and we're likely to use a fast switch to
> ensure exclusive links between systems.
>
> Eli

With this setup, you should be able to have an error rate that is
incredibly low.  As long as the possibility of a lost packet is not
catastrophic, UDP would be a very good match.  The protocol we offload
has sequence numbers in it, so we know if we have lost a packet.  The
data is being used for realtime signal processing, so losing a packet
just looks like a burst of noise.

Regards,

John McCaskill
www.fastertechnology.com

Reply by eliben ●September 6, 20072007-09-06

On Sep 6, 4:45 pm, John McCaskill <jhmccask...@gmail.com> wrote:
> On Sep 6, 12:22 am, eliben <eli...@gmail.com> wrote:
>
>
>
> > > If you have the choice between UDP and TCP, UDP is much simpler and
> > > fits an FPGA well. The big issue in choosing between the two is if you
> > > require the guaranteed delivery of TCP, or can tollerate the potential
> > > packet loss of UDP.
>
> > > As an example, we make a card that acquires real time data in a custom
> > > protocol that is wrapped in UDP. We use a Xilinx Virtex-4 FX60, and a
> > > protocol offload engine that uses the Xilinx PicoBlaze soft processor
> > > to deal with the protocol stack.  The PicoBlaze is an 8-bit soft
> > > processor.  It looks at each incomming packet and reads the header to
> > > see if it is one of the real time streams we are trying to offload.
> > > If it is, it sends the header to one circular buffer in memory and the
> > > data to another circular buffer.  If it is not, it sends it to a
> > > kernel buffer and we let the Linux network stack deal with it.
>
> > > With this setup, we can consume data at over 90 MB/sec per Gigabit
> > > Ethernet port.  The data part of the packet is 1024 bytes, and each
> > > GigE port has its own PicoBlaze dedicated to it.
>
> > So where is the actual UDP communication implemented ? In Linux ? What
> > processor is it running on ? Is it an external CPU or the built-in PPC
> > of Virtex 4 ?
>
> We are using one of the embedded PowerPCs to run Linux, and one
> PicoBlaze soft processor per EMAC in the design.
>
> As each packet exits the EMAC, its header is examined by software that
> is running on the PicoBlaze.  The PicoBlaze is running a very simple
> stack written in assembly language.  As it looks at each layer of the
> header, it makes a decision to do one of several things. At the
> Ethernet level, it is deciding if it should just throw the packet
> away, or pass it on to the next layer. At the IP and UDP layers, it is
> deciding if the packet belongs to the protocol that we are offloading,
> and is a stream that we have requested.  If the packet does belong to
> the protocol that we are offloading, the PicoBlaze sets up a PLB DMA
> engine to send it down one data path. If the packet does not belong to
> the protocol we are offloading, then the PicoBlaze sends it down
> another data path and it is given to the Linux kernel to deal with.
>
> So for the protocol that we are offloading, the entire Ethernet, IP,
> UDP, and custom protocol stack are implemented in PicoBlaze assembly
> code.  For everything else, the stack is in Linux.  Since the data we
> are offloading is multicast data, this lets us have the PicoBlaze deal
> with the simple but high speed UDP packets, and the PowerPC running
> Linux deal with the IGMP messaging required to join, leave and
> maintain membership in a multicast group.
>
> The PicoBlaze is just running a single threaded loop of code that
> polls for input or output data, and then deals with it.  Its program
> is loaded from the PowerPC taking advantage of the fact that the BRAM
> holding its program is dual ported.

At what frequency does the PicoBlaze run ? It must be pretty fast to
deal with packets at this bandwidth. Or is the fact that it's only
examining the frame headers saves you from the need of high speed ?

Thanks
Eli

Reply by John McCaskill ●September 6, 20072007-09-06

On Sep 6, 1:25 pm, eliben <eli...@gmail.com> wrote:
> On Sep 6, 4:45 pm, John McCaskill <jhmccask...@gmail.com> wrote:
>
>
>
> > On Sep 6, 12:22 am, eliben <eli...@gmail.com> wrote:
>
> > > > If you have the choice between UDP and TCP, UDP is much simpler and
> > > > fits an FPGA well. The big issue in choosing between the two is if you
> > > > require the guaranteed delivery of TCP, or can tollerate the potential
> > > > packet loss of UDP.
>
> > > > As an example, we make a card that acquires real time data in a custom
> > > > protocol that is wrapped in UDP. We use a Xilinx Virtex-4 FX60, and a
> > > > protocol offload engine that uses the Xilinx PicoBlaze soft processor
> > > > to deal with the protocol stack.  The PicoBlaze is an 8-bit soft
> > > > processor.  It looks at each incomming packet and reads the header to
> > > > see if it is one of the real time streams we are trying to offload.
> > > > If it is, it sends the header to one circular buffer in memory and the
> > > > data to another circular buffer.  If it is not, it sends it to a
> > > > kernel buffer and we let the Linux network stack deal with it.
>
> > > > With this setup, we can consume data at over 90 MB/sec per Gigabit
> > > > Ethernet port.  The data part of the packet is 1024 bytes, and each
> > > > GigE port has its own PicoBlaze dedicated to it.
>
> > > So where is the actual UDP communication implemented ? In Linux ? What
> > > processor is it running on ? Is it an external CPU or the built-in PPC
> > > of Virtex 4 ?
>
> > We are using one of the embedded PowerPCs to run Linux, and one
> > PicoBlaze soft processor per EMAC in the design.
>
> > As each packet exits the EMAC, its header is examined by software that
> > is running on the PicoBlaze.  The PicoBlaze is running a very simple
> > stack written in assembly language.  As it looks at each layer of the
> > header, it makes a decision to do one of several things. At the
> > Ethernet level, it is deciding if it should just throw the packet
> > away, or pass it on to the next layer. At the IP and UDP layers, it is
> > deciding if the packet belongs to the protocol that we are offloading,
> > and is a stream that we have requested.  If the packet does belong to
> > the protocol that we are offloading, the PicoBlaze sets up a PLB DMA
> > engine to send it down one data path. If the packet does not belong to
> > the protocol we are offloading, then the PicoBlaze sends it down
> > another data path and it is given to the Linux kernel to deal with.
>
> > So for the protocol that we are offloading, the entire Ethernet, IP,
> > UDP, and custom protocol stack are implemented in PicoBlaze assembly
> > code.  For everything else, the stack is in Linux.  Since the data we
> > are offloading is multicast data, this lets us have the PicoBlaze deal
> > with the simple but high speed UDP packets, and the PowerPC running
> > Linux deal with the IGMP messaging required to join, leave and
> > maintain membership in a multicast group.
>
> > The PicoBlaze is just running a single threaded loop of code that
> > polls for input or output data, and then deals with it.  Its program
> > is loaded from the PowerPC taking advantage of the fact that the BRAM
> > holding its program is dual ported.
>
> At what frequency does the PicoBlaze run ? It must be pretty fast to
> deal with packets at this bandwidth. Or is the fact that it's only
> examining the frame headers saves you from the need of high speed ?
>
> Thanks
> Eli


We run the EMAC 8 bits wide at 125 MHz, and the PicBlaze at 62.5 MHz
using a divided by two version of the EMAC clock.  The PicoBlaze takes
two cycles per instruction, and the packets we are offloading are a
bit over 1KB, so we about 512 instructions to deal with an offloaded
packet and other overhead.  Dealing with a non-offloaded packet takes
the shortest path through the code to keep the number of packets per
second we can handle up.  The network the data is on is tightly
controlled, so there is very little on it that is not the protocol we
are offloading, mostly just IGMP packets for dealing with the
multicast groups, and they are at a very low rate.

You are correct in that we do not look at the entire packet with the
PicoBlaze, just the header.  Once it has determined that it wants to
offload that packet, it then has a little bit more work to do to
calculate addresses and load them into the DMA engine.  To make sure
that we do not drop packets, we just need to make sure that the
longest path through the code takes less time than about how long it
takes to receive a packet.  We have a FIFO between the EMAC and the
DMA engine, so we can smooth things out a bit.

Regards,

John McCaskill
www.fastertechnology.com

Previous 123 4 Next

high bandwitch ethernet communication

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group