FPGARelated.com
Forums

bare-metal ZYNQ

Started by John Larkin June 12, 2019
On Wednesday, June 12, 2019 at 7:34:09 PM UTC-4, John Larkin wrote:
> Assume I'm a pointy-haired boss trying to help one of my guys. > > I think that... > > The Xilinx ZYNQ (FPGA+ARM on a chip) has a hard boot loader. It > figures out what the boot device is (serial flash, SD card, whatever) > and reads in a secondary boot program, which the Xilinx tools provide > as part of a build. That loader then reads the entire FPGA config > bitstream into DRAM, and sets up a giant DMA transfer to configure the > FPGA. That's all standard in the tools. > > But what if there's no DRAM? My guy thinks he will have to write his > own ARM application, which is booted at load time, and inside that > would be a routine to read from the boot media and configure the FPGA > in chunks, using a small uP RAM buffer, maybe DMA or maybe not. He > figures he could do that in a few days. > > Seems to me that Xilinx should support booting up a ZYNQ without DRAM. > Does the tool chain support that (people here think not) or is there > some loader already coded somewhere? > > (Our support, through a distributor, isn't very good.) > > Thanks > > > > > > -- > > John Larkin Highland Technology, Inc > picosecond timing precision measurement > > jlarkin att highlandtechnology dott com > http://www.highlandtechnology.com
You're in the Bay Area, right? I've had good experiences with the Avnet Xilinx FAEs out there. I can put you in touch with some folks if you need. There is a reference design for running out of On-Chip Memory (OCM): https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842377/Zynq-7000+AP+SoC+Boot+-+Booting+and+Running+Without+External+Memory+Tech+Tip It's worth reviewing UG585 - Zynq-7000 SoC TRM, Chapter 6: Boot and Configuration as well. Section 6.4 - Device Boot and PL Configuration is particularly helpful here. https://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf In a lot of cases, the PL is configured by the FSBL, which can be run from OCM and it's rather trivial to enable that in the Xilinx SDK.
On a sunny day (Thu, 13 Jun 2019 08:09:16 -0700) it happened John Larkin
<jjlarkin@highlandtechnology.com> wrote in
<agp4getd0ualcf8f9sl2s5ug2541t03e7g@4ax.com>:

>On Thu, 13 Jun 2019 06:14:09 GMT, Jan Panteltje ><pNaOnStPeAlMtje@yahoo.com> wrote: > >>On a sunny day (Wed, 12 Jun 2019 16:32:35 -0700) it happened John Larkin >><jjlarkin@highland_snip_technology.com> wrote in >><qm13gel4ifba24lb4p8gdeeusufc2b433b@4ax.com>: >> >>>But what if there's no DRAM? >> >> >>That thing runs Linux? >>Does not Linux use the DRAM? >> >> >>If not using Linux and DRAM then a simpler cheaper FPGA board? > >I said "bare metal." > >Separate FPGA and CPU chips is an option that we use a lot already, >but it needs a chip-chip parallel interface that uses a lot of balls, >or a slow SPI link. > >The NXP uP that we usually use for this combo, LPC3250, looks to be >EOL, so we're looking for a next-generation product platform.
OK, just did a read of the 80 pages datasheet of the LPC3250. While reading I was thinking about the chip in the Raspberry pi Broadcom BCM2835 -- 2837 but that has no ADC.. but does have HDMI out.. There exists a FPGA plugin board for the Raspberry. It is a pity that so many things go EOL in a short time, OTOH it is a throw away society. And very strong competition does kill some products. It all depends on what you want to do. A Raspberry plus some external ADC 35$ + ?? VERY powerful platform, really, GCC compiler, Linux, lots of I/O. USB, Ethernet, HDMI, analog video out, analog audio out, GPIO for extra boards... SDcard, camera interface, logic level serial, PWM, PLL frequency generators, and although every year a new model, the basics stay more or less the same, quadcore now, lots of DRAM, availability... Depends on what you call 'bare metal' these days. https://en.wikipedia.org/wiki/Raspberry_Pi I have several in use... It is sort of moving to an ever higher level of integration.
On 2019-06-13 17:09, John Larkin wrote:
> > Separate FPGA and CPU chips is an option that we use a lot already, > but it needs a chip-chip parallel interface that uses a lot of balls, > or a slow SPI link. > > The NXP uP that we usually use for this combo, LPC3250, looks to be > EOL, so we're looking for a next-generation product platform.
The chip-chip parallel interface is quickly becoming a chip-chip serial interface, now that most higher-end embedded CPUs have PCIe. NXP i.MX series has many variants with PCIe. So do many DSPs from TI. It looks like nowadays PCIe gets to be the go-to interface both between CPU and DSP and between CPU (or DSP) and FPGA. Few balls and high speed. For CPU-DSP, the application CPU is the typically the root complex and the DSP(s) is(are) typically the endpoint(s). The endpoint side can send interrupt packets when it has data (or otherwise requires attention). Regards Dimitrij
On 6/12/19 7:32 PM, John Larkin wrote:
> > > Assume I'm a pointy-haired boss trying to help one of my guys. > > I think that... > > The Xilinx ZYNQ (FPGA+ARM on a chip) has a hard boot loader. It > figures out what the boot device is (serial flash, SD card, whatever) > and reads in a secondary boot program, which the Xilinx tools provide > as part of a build. That loader then reads the entire FPGA config > bitstream into DRAM, and sets up a giant DMA transfer to configure the > FPGA. That's all standard in the tools. > > But what if there's no DRAM? My guy thinks he will have to write his > own ARM application, which is booted at load time, and inside that > would be a routine to read from the boot media and configure the FPGA > in chunks, using a small uP RAM buffer, maybe DMA or maybe not. He > figures he could do that in a few days. > > Seems to me that Xilinx should support booting up a ZYNQ without DRAM. > Does the tool chain support that (people here think not) or is there > some loader already coded somewhere? > > (Our support, through a distributor, isn't very good.) > > Thanks
It has been awhile since I used that chip, but my memory was that what you are describing was the two stage boot loading process. There is a First Level Boot Loader put into the internal flash of the device that loads a program into the internal SRAM of the part from a limited selection of sources (mostly limited to what you could load from with a simple boot loader). This program is often just a Second Level Bootloader, but could also be a simple 'bare metal' program. The Second Level Bootloader generally had the ability to configure DRAM and load the program it was loading into it, but it did not need to. The other task normally done by the Boot Loader was to load the configuration data into the FPGA, but that could also be put off till later. When Booting to Linux, the Second Level Boot Loader actually just loaded GRUB, and then GRUB loaded Linux and started it. GRUB and Linux required DRAM, and much of the documentation assumes going to Linux, but the tools did support other configurations.
On 13/06/2019 18:35, Jan Panteltje wrote:
> On a sunny day (Thu, 13 Jun 2019 08:09:16 -0700) it happened John Larkin > <jjlarkin@highlandtechnology.com> wrote in > <agp4getd0ualcf8f9sl2s5ug2541t03e7g@4ax.com>: > >> On Thu, 13 Jun 2019 06:14:09 GMT, Jan Panteltje >> <pNaOnStPeAlMtje@yahoo.com> wrote: >> >>> On a sunny day (Wed, 12 Jun 2019 16:32:35 -0700) it happened John Larkin >>> <jjlarkin@highland_snip_technology.com> wrote in >>> <qm13gel4ifba24lb4p8gdeeusufc2b433b@4ax.com>: >>> >>>> But what if there's no DRAM? >>> >>> >>> That thing runs Linux? >>> Does not Linux use the DRAM? >>> >>> >>> If not using Linux and DRAM then a simpler cheaper FPGA board? >> >> I said "bare metal." >> >> Separate FPGA and CPU chips is an option that we use a lot already, >> but it needs a chip-chip parallel interface that uses a lot of balls, >> or a slow SPI link. >> >> The NXP uP that we usually use for this combo, LPC3250, looks to be >> EOL, so we're looking for a next-generation product platform. > > OK, just did a read of the 80 pages datasheet of the LPC3250. > While reading I was thinking about the chip in the Raspberry pi > Broadcom BCM2835 -- 2837 > but that has no ADC.. but does have HDMI out.. > There exists a FPGA plugin board for the Raspberry. > > It is a pity that so many things go EOL in a short time, > OTOH it is a throw away society. > And very strong competition does kill some products. > > It all depends on what you want to do. > > A Raspberry plus some external ADC 35$ + ?? > VERY powerful platform, really, GCC compiler, Linux, lots of I/O. > USB, Ethernet, HDMI, analog video out, analog audio out, > GPIO for extra boards... SDcard, camera interface, logic level serial, PWM, > PLL frequency generators, and although every year a new model, the > basics stay more or less the same, quadcore now, lots of DRAM, > availability... > > Depends on what you call 'bare metal' these days. > https://en.wikipedia.org/wiki/Raspberry_Pi > > I have several in use... > > It is sort of moving to an ever higher level of integration. >
Jan - do you know of a good, simple and fast way to get the Pi to exchange data with an adjacent chip (uP or FPGA). Using USB or Ethernet doesn't count as simple (or very fast for small data packets.) MK --- This email has been checked for viruses by AVG. https://www.avg.com
In comp.arch.fpga John Larkin <jjlarkin@highland_snip_technology.com> wrote:
> > Seems to me that Xilinx should support booting up a ZYNQ without DRAM. > Does the tool chain support that (people here think not) or is there > some loader already coded somewhere?
Hmm... it's not the same, but on the Intel Cyclone V parts (and others I think) there's just a FIFO. You can push in bitstream words, and configuration only happens when the full bitstream is provided and it meets some kinds of checks. The Zynq appears to drive such a process via DMA - the PCAP in chapter 6 here https://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf It doesn't say as much, but I wonder if it's possible to transfer in chunked DMA. The Linux driver probably has to chunk anyway, given the RAM buffer you want to transfer may not be in contiguous physical memory. As to support for this in the tools, without DRAM you're probably running a custom OS, so there's a limit to what they can do. On the Arria 10 one 'normal' boot process is: ROM bootloader reads SD card, starts u-boot, which writes FPGA bitstream then boots Linux. Now you mention it, I think u-boot must be running without DRAM because the DRAM pins are only configured by the bitstream. So it could be worth looking to see if a similar process works on Zynq. (instead of SD card, QSPI and other storage is also selectable) Theo
On a sunny day (Fri, 14 Jun 2019 09:11:36 +0100) it happened Michael Kellett
<mk@mkesc.co.uk> wrote in <98SdnfMZpZZdy57AnZ2dnUU78LXNnZ2d@giganews.com>:

>Jan - do you know of a good, simple and fast way to get the Pi to >exchange data with an adjacent chip (uP or FPGA).
Sure, first for FPGA there is this: http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/RaspberryPiFPGA this connects via GPIO. I notice a lot more big names have now FPGA stuff for raspberry.. Just google 'raspberry FPGA board;. Depending on your definition of 'fast' with a micro, the Pi had logic level RS232 via /dev/ttyAMA0, also hardware SPI (or software SPI of course), i2c the same. Here used as a a large LED matrix display driver: http://panteltje.com/panteltje/raspberry_pi_FDS132_matrix_display_driver/index.html You can also use 8 bits from GPIO and do byte level transfers, a typical example of 'fast' is this: http://panteltje.com/panteltje/raspberry_pi_dvb-s_transmitter/ that also uses a FIFO hardware buffer to get a smooth timed data stream even during OS task switching. 8 bits (or more) transfer with handshake will work with most micros. Here the Pi as JTAG programmer: http://panteltje.com/panteltje/raspberry_pi/ Stepper motor driver, lots of other i2c chips.. http://panteltje.com/panteltje/xgpspc/index.html
>Using USB or Ethernet >doesn't count as simple (or very fast for small data packets.)
USB is slow on my older Raspberries at least, ethernet is OK. I would prefer ethernet in some applications because of the galvanic isolation. What is simple? Everything is simple once you have dunnit.
On 14/06/2019 10:44, Jan Panteltje wrote:
> On a sunny day (Fri, 14 Jun 2019 09:11:36 +0100) it happened Michael Kellett > <mk@mkesc.co.uk> wrote in <98SdnfMZpZZdy57AnZ2dnUU78LXNnZ2d@giganews.com>: > >> Jan - do you know of a good, simple and fast way to get the Pi to >> exchange data with an adjacent chip (uP or FPGA). > > > > Sure, first for FPGA there is this: > http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/RaspberryPiFPGA > this connects via GPIO. > > I notice a lot more big names have now FPGA stuff for raspberry.. > Just google 'raspberry FPGA board;. > > > Depending on your definition of 'fast' with a micro, > the Pi had logic level RS232 via /dev/ttyAMA0, > also hardware SPI (or software SPI of course), i2c the same. > > Here used as a a large LED matrix display driver: > http://panteltje.com/panteltje/raspberry_pi_FDS132_matrix_display_driver/index.html > > > You can also use 8 bits from GPIO and do byte level transfers, > a typical example of 'fast' is this: > http://panteltje.com/panteltje/raspberry_pi_dvb-s_transmitter/ > that also uses a FIFO hardware buffer to get a smooth timed data stream > even during OS task switching. > > 8 bits (or more) transfer with handshake will work with most micros. > > Here the Pi as JTAG programmer: > http://panteltje.com/panteltje/raspberry_pi/ > > Stepper motor driver, lots of other i2c chips.. > http://panteltje.com/panteltje/xgpspc/index.html > > >> Using USB or Ethernet >> doesn't count as simple (or very fast for small data packets.) > > USB is slow on my older Raspberries at least, ethernet is OK. > I would prefer ethernet in some applications because of the galvanic isolation. > > What is simple? > Everything is simple once you have dunnit. >
Thanks for the stuff Jan, I don't think I explained quite what I meant by fast (although I did say that Ethernet wasn't fast enough). So fast for me, for the applications I have in mind is: round trip < 1us (less than 50ns preferred) - easy to do with FPGA memory mapped to uP and pretending to be a RAM - but I don't see how to do it on a Pi. Sustained data transfer rate > 100MiB per second in both directions simultaneously. You can do this kind of stuff with the Prus on the Beagleboards but it would be nice if it were possible on a Pi. Simple means (in this context) not using lots of other fancy chips over and above the FPGA and not needing to use a GHz serial interface. (although if the PI had one spare that I don't know about I might have a go.) I had wondered if the the camera or audio interfaces might be re-purposed. MK --- This email has been checked for viruses by AVG. https://www.avg.com
On a sunny day (Sat, 15 Jun 2019 13:14:27 +0100) it happened Michael Kellett
<mk@mkesc.co.uk> wrote in <TOWdnRE_0_exfJnAnZ2dnUU78e_NnZ2d@giganews.com>:

>On 14/06/2019 10:44, Jan Panteltje wrote: >> On a sunny day (Fri, 14 Jun 2019 09:11:36 +0100) it happened Michael Kellett >> <mk@mkesc.co.uk> wrote in <98SdnfMZpZZdy57AnZ2dnUU78LXNnZ2d@giganews.com>: >> >>> Jan - do you know of a good, simple and fast way to get the Pi to >>> exchange data with an adjacent chip (uP or FPGA). >> >> >> >> Sure, first for FPGA there is this: >> http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/RaspberryPiFPGA >> this connects via GPIO. >> >> I notice a lot more big names have now FPGA stuff for raspberry.. >> Just google 'raspberry FPGA board;. >> >> >> Depending on your definition of 'fast' with a micro, >> the Pi had logic level RS232 via /dev/ttyAMA0, >> also hardware SPI (or software SPI of course), i2c the same. >> >> Here used as a a large LED matrix display driver: >> http://panteltje.com/panteltje/raspberry_pi_FDS132_matrix_display_driver/index.html >> >> >> You can also use 8 bits from GPIO and do byte level transfers, >> a typical example of 'fast' is this: >> http://panteltje.com/panteltje/raspberry_pi_dvb-s_transmitter/ >> that also uses a FIFO hardware buffer to get a smooth timed data stream >> even during OS task switching. >> >> 8 bits (or more) transfer with handshake will work with most micros. >> >> Here the Pi as JTAG programmer: >> http://panteltje.com/panteltje/raspberry_pi/ >> >> Stepper motor driver, lots of other i2c chips.. >> http://panteltje.com/panteltje/xgpspc/index.html >> >> >>> Using USB or Ethernet >>> doesn't count as simple (or very fast for small data packets.) >> >> USB is slow on my older Raspberries at least, ethernet is OK. >> I would prefer ethernet in some applications because of the galvanic isolation. >> >> What is simple? >> Everything is simple once you have dunnit. >> > >Thanks for the stuff Jan, I don't think I explained quite what I meant >by fast (although I did say that Ethernet wasn't fast enough). > >So fast for me, for the applications I have in mind is: > >round trip < 1us (less than 50ns preferred) - easy to do with FPGA >memory mapped to uP and pretending to be a RAM - but I don't see how to >do it on a Pi. >Sustained data transfer rate > 100MiB per second in both directions >simultaneously. > >You can do this kind of stuff with the Prus on the Beagleboards but it >would be nice if it were possible on a Pi. > >Simple means (in this context) not using lots of other fancy chips over >and above the FPGA and not needing to use a GHz serial interface. >(although if the PI had one spare that I don't know about I might have a >go.) > >I had wondered if the the camera or audio interfaces might be re-purposed. > >MK
Audio I do not think is usable for that, but who knows... AFAIK the camera interface is from camera to board, so one way. What I meant is if you have 8 or more GPIO pins, say a byte then there is nothing stopping you from putting a byte on that, and use a pin as handshake 'new data'. FPGA would read the handshake and read the byte, and then set a ready pin, Pi would then output he next byte, Now you have 10 pins and maximum I/O speed. Same for 16 bits 18 pins. The throughput problems is set by the Pi Linux multitasker, it will interrupt the stream every now and then for a few milliseconds at least, That is where you need the FIFO. But that FIFO can be in FPGA RAM, no external logic needed as I do here: http://panteltje.com/panteltje/raspberry_pi_dvb-s_transmitter/ I did consider doing that in FPGA, but that seemed a bit of overkill in this case. So then maximum speed boils down to hwo fast the Pi can output data really.. have not tested that, as I never was close to that limit. It is simple to test, write some I/O pin toggle routine in asm, or even C, and look at the scope. loop: out 0x00 out 0xff goto loop Pi has DMA, have not used it myself, here some discussion: https://www.raspberrypi.org/forums/viewtopic.php?t=8376 Maybe this is of more use to you: https://github.com/hzeller/rpi-gpio-dma-demo They did the pin toggle and: <quote> The resulting output wave on the Raspberry Pi 1 of 22.7Mhz, the Raspberry Pi 2 reaches 41.7Mhz and the Raspberry Pi 3 65.8 Mhz.
>end quote>
Fast enough?
On 15/06/19 13:14, Michael Kellett wrote:
> On 14/06/2019 10:44, Jan Panteltje wrote: >> On a sunny day (Fri, 14 Jun 2019 09:11:36 +0100) it happened Michael Kellett >> <mk@mkesc.co.uk> wrote in <98SdnfMZpZZdy57AnZ2dnUU78LXNnZ2d@giganews.com>: >> >>> Jan - do you know of a good, simple and fast way to get the Pi to >>> exchange data with an adjacent chip (uP or FPGA). >> >> >> >> Sure, first for FPGA there is this: >> &#4294967295; http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/RaspberryPiFPGA >> this connects via GPIO. >> >> I notice a lot more big names have now FPGA stuff for raspberry.. >> &#4294967295; Just google 'raspberry FPGA board;. >> >> >> Depending on your definition of 'fast' with a micro, >> the Pi had logic level RS232 via /dev/ttyAMA0, >> also hardware SPI (or software SPI of course), i2c the same. >> >> Here used as a a large LED matrix display driver: >> >> http://panteltje.com/panteltje/raspberry_pi_FDS132_matrix_display_driver/index.html >> >> >> >> You can also use 8 bits from GPIO and do byte level transfers, >> a typical example of 'fast' is this: >> &#4294967295; http://panteltje.com/panteltje/raspberry_pi_dvb-s_transmitter/ >> that also uses a FIFO hardware buffer to get a smooth timed data stream >> even during OS task switching. >> >> 8 bits (or more) transfer with handshake will work with most micros. >> >> Here the Pi as JTAG programmer: >> &#4294967295; http://panteltje.com/panteltje/raspberry_pi/ >> >> Stepper motor driver, lots of other i2c chips.. >> &#4294967295; http://panteltje.com/panteltje/xgpspc/index.html >> >> >>> Using USB or Ethernet >>> doesn't count as simple (or very fast for small data packets.) >> >> USB is slow on my older Raspberries at least, ethernet is OK. >> I would prefer ethernet in some applications because of the galvanic isolation. >> >> What is simple? >> Everything is simple once you have dunnit. >> > > Thanks for the stuff Jan, I don't think I explained quite what I meant by fast > (although I did say that Ethernet wasn't fast enough). > > So fast for me, for the applications I have in mind is: > > round trip < 1us (less than 50ns preferred) - easy to do with FPGA memory mapped > to uP and pretending to be a RAM - but I don't see how to do it on a Pi. > Sustained data transfer rate > 100MiB per second in both directions simultaneously.
You could do that on the XMOS xCORE devices. They are fast enough to take the 100Mb/s serial ethernet traffic, and process it in software. *And* be doing other things at the same time, guaranteed by design (not tests!) :) The IDE states the maximum timing between two points, e.g. two i/o operations, or loop times. That's possible since there are no caches, no interrupts, and the latency is <100ns (much less in my experience).
> You can do this kind of stuff with the Prus on the Beagleboards but it would be > nice if it were possible on a Pi. > > Simple means (in this context) not using lots of other fancy chips over and > above the FPGA and not needing to use a GHz serial interface. (although if the > PI had one spare that I don't know about I might have a go.)
The xCORE i/o is very nice, easy to use, and is similar to FPGAs in flexibility (e.g. SERDES, or strobes, or...)
> I had wondered if the the camera or audio interfaces might be re-purposed.