FPGARelated.com
Forums

Re: Help with a face recognition system

Started by Matthew Hicks March 31, 2007
We (Illiac 6 research group at the University of Illinois) are currenlty 
working on porting one of the more popular face recognition programs for 
an application that is going to run on our "Communications Supercomputer", 
which involves a Virtex II-Pro FPGA.  These is no benefit from implementing 
a processor in the FPGA and then running the existing C code on it, becuase 
there is no way it will come close to the performance of a processor in ASIC. 
 Your only option of utilizing the FPGA for speedup is going to the roots 
of the algorithm(s), finding parallelism, and writing HDL code to exploit 
that parallelism.  That is the phase we are currently in.  Remember that 
an algorithm that may not be the best in a sequential environment may shine 
in a highly parallel environment, so it's best to look at all algorithms 
for possible parallel structure.


---Matthew Hicks


> Peter, > > First of all, thanks for replying :-). > > We (team of 5) spent the last semester exploring and evaluating the > different algorithms. We came to the conclusion that PCA would be the > best compromise between complexity and accuracy, with the > recommendation that Neural Networks be added into it if possible, to > increase the accuracy even further. > > We did evaluate different implementations of the algorithm, some on > MATLAB and some running natively, and the performance was as we > expected: not suitable for real-time, especially for large face > databases. > > This only confirmed what was suggested at the beginning of our > research, which was the main reason we wanted to explore an FPGA > implementation to make up for the performance shortcomings while > maintaining (or better, increasing) the level of accuracy. > > Right now we have in fact divided up the work between us, and 3 of my > team are working on the algorithm (it being the main focus of the > project), while one is working on the neural network, and myself > working on the FPGA. > > What remains to be determined is the kind of design that the algorithm > will be implemented with. Whether it would be solely VHDL, a soft > microprocessor and asm/C/C++ code, or some combination of the two. > And, of course, whether the hardware currently at hand would support > that design. > > Also, after sending my earlier post, I stumbled upon a project on > opencores.org called Java Optimized Processor (http:// > www.opencores.com/projects.cgi/web/jop/overview), which is essentially > a soft processor for Java bytecode. I was wondering how good/robust/ > flexible it is, and whether someone here has actually used it in any > way on actual hardware. I might try to implement it and download it to > my FPGA, if I can get it to compile on Webpack 9.1i without trouble. > > I did check out the Xilinx University Program. The "Virtex-II Pro > Development System" seems like an extreme overkill in our case, since > the use of FPGA isn't standard curriculum in our faculty; we are > mainly doing this as a unique, single-case approach. It seems like an > excellent choice for an engineering faculty, though. Unfortunately, I > don't think our faculty (Computer Science) would be willing to make > such a purchase based on a single case requirement, especially taking > into account the relatively high currency exchange rate (1 USD ~= 5.7 > EGP). > > Thanks again for your response, and I hope I haven't bored you with my > long reply... > > Best Regards, > Islam Ossama > On Mar 31, 5:17 am, "Peter Alfke" <a...@sbcglobal.net> wrote: > >> Islam, >> If I were you, I would: >> first explore the best algorithm >> then see whether it can be implemented on a (any!) microprocessor, >> achieving reasonable performance. >> If the microprocessor implementation is too slow, I would look at a >> way to speed it up with an FPGA, and I would use an existing board of >> reasonable performance. >> The Xilinx University Program has a better board, based on Virtex- >> IIPro, that is surprisingly inexpensive for universities. >> I would contact the Xilinx University Program about details. >> Challenging project! >> Peter Alfke
Matthew,

Parallelism was also a factor in choosing PCA for implementation on
FPGA, and Composite-PCA can even increase that parallelism.

I think it's a very good point what you said about speed concerning a
C implementation, which means we'll probably take your suggestion and
do it entirely in VHDL. The part of the team working on the algorithm
is already breaking it down into parallel parts; hopefully this would
make the algorithm really "shine", as we definitely need it to. Also,
for the sake of comparison, I'm thinking we can implement the same
algorithm on a standard PC with threading and run it in real-time
priority, and compare the results to see what was gained through the
FPGA implementation. I'm sure the results would be interesting either
way.

And in response to Jim's question, this is more of a research project;
so, naturally, we want it to support as large a database as possible.
The plan is to keep testing it with databases of increasing size until
we get it to reach the maximum size possible without breaking the real-
time requirement.

Thanks all for your responses...


Best Regards,
Islam Ossama


On Mar 31, 9:23 am, Matthew Hicks <mdhic...@uiuc.edu> wrote:
> We (Illiac 6 research group at the University of Illinois) are currenlty > working on porting one of the more popular face recognition programs for > an application that is going to run on our "Communications Supercomputer", > which involves a Virtex II-Pro FPGA. These is no benefit from implementing > a processor in the FPGA and then running the existing C code on it, becuase > there is no way it will come close to the performance of a processor in ASIC. > Your only option of utilizing the FPGA for speedup is going to the roots > of the algorithm(s), finding parallelism, and writing HDL code to exploit > that parallelism. That is the phase we are currently in. Remember that > an algorithm that may not be the best in a sequential environment may shine > in a highly parallel environment, so it's best to look at all algorithms > for possible parallel structure. > > ---Matthew Hicks
Islam Ossama wrote:
> Matthew, > > Parallelism was also a factor in choosing PCA for implementation on > FPGA, and Composite-PCA can even increase that parallelism. > > I think it's a very good point what you said about speed concerning a > C implementation, which means we'll probably take your suggestion and > do it entirely in VHDL. The part of the team working on the algorithm > is already breaking it down into parallel parts; hopefully this would > make the algorithm really "shine", as we definitely need it to. Also, > for the sake of comparison, I'm thinking we can implement the same > algorithm on a standard PC with threading and run it in real-time > priority, and compare the results to see what was gained through the > FPGA implementation. I'm sure the results would be interesting either > way.
You could opt for an hybrid solution... do all the massively parallelizable things with FPGA fabric (after all, this is what FPGAs are all about when applied to high-speed processing) and do the more sequential/supervisory/etc. stuff on a CPU, preferably one of the PPC405 cores present in V2P and 4VFX FPGAs - these real on-chip CPU cores will provide far better performance than any soft-CPU you can possibly come up with, the only caveat is that you will only have two such CPUs available at most.
Thanks for the suggestion, I'm seriously taking it into consideration.
I already contacted the local Xilinx supplier and working out the
details of getting the XUP board.

I just hope I can live up to the level of this project, all this
hardware stuff is new to me and I'm kinda starting to long for the
comfort and warmth of software implementations and having the OS take
care of all the dirty details for me. I guess that's why the idea of
using the PPC processors would be attractive to me, though I'd still
have to take care of some low-level details myself (unless I load a
tiny linux kernel on one or both of the processors, maybe? hmmm, it'll
definitely take some careful (re)thinking).

Well, thanks again to everyone, your responses have all been extremely
helpful.


Best Regards,
Islam Ossama


On Apr 2, 7:29 am, "Daniel S." <digitalmastrmind_no_s...@hotmail.com>
wrote:
> > You could opt for an hybrid solution... do all the massively parallelizable > things with FPGA fabric (after all, this is what FPGAs are all about when > applied to high-speed processing) and do the more > sequential/supervisory/etc. stuff on a CPU, preferably one of the PPC405 > cores present in V2P and 4VFX FPGAs - these real on-chip CPU cores will > provide far better performance than any soft-CPU you can possibly come up > with, the only caveat is that you will only have two such CPUs available at > most.
Honestly I think that if you have no hardware experience, this will be
quite a challenge. Not to say that it's impossible but you're
certainly going to have a few sleepness nights...

In your case, I would rather suggest an all-software solution. One
idea would be to use a PS3 and harness the power of the Cell processor
(you can run linux on it I believe). The issue is to parallelize your
algorithm enough to harness the power of the 9 cores (similar problem
than if you were going to an FPGA solution). If one PS3 is not enough,
maybe you could use 2, or 4... You could build a smaller cluster for
not too much money.

My 2 =A2.

Patrick

On a sunny day (2 Apr 2007 14:45:45 -0700) it happened "Patrick Dubois"
<prdubois@gmail.com> wrote in
<1175550345.009582.122910@y66g2000hsf.googlegroups.com>:

>Honestly I think that if you have no hardware experience, this will be >quite a challenge. Not to say that it's impossible but you're >certainly going to have a few sleepness nights... > >In your case, I would rather suggest an all-software solution. One >idea would be to use a PS3 and harness the power of the Cell processor >(you can run linux on it I believe). The issue is to parallelize your >algorithm enough to harness the power of the 9 cores
PS3 has only 1 power processor and _6_ SPE cores. http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit And it sucks 200W if fully loaded.
On Apr 3, 8:59 am, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote:
> PS3 has only 1 power processor and _6_ SPE cores. > http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit
Nope, 1 central PPC core and 8 Synergistic Processor Unit: http://www.research.ibm.com/cell/heterogeneousCMP.html
On a sunny day (3 Apr 2007 08:21:24 -0700) it happened "Patrick Dubois"
<prdubois@gmail.com> wrote in
<1175613684.443163.290410@d57g2000hsg.googlegroups.com>:

>On Apr 3, 8:59 am, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote: >> PS3 has only 1 power processor and _6_ SPE cores. >> http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit > >Nope, 1 central PPC core and 8 Synergistic Processor Unit: >http://www.research.ibm.com/cell/heterogeneousCMP.html
Nope, in the PS3 only 6 are available.
On Apr 3, 2:00 pm, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote:
> On a sunny day (3 Apr 2007 08:21:24 -0700) it happened "PatrickDubois" > <prdub...@gmail.com> wrote in > <1175613684.443163.290...@d57g2000hsg.googlegroups.com>: > > >On Apr 3, 8:59 am, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote: > >> PS3 has only 1 power processor and _6_ SPE cores. > >> http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit > > >Nope, 1 central PPC core and 8 Synergistic Processor Unit: > >http://www.research.ibm.com/cell/heterogeneousCMP.html > > Nope, in the PS3 only 6 are available.
Alright, I don't want to argue about this but I think we can fairly say that the info on the web is not clear... Just for fun, here's a link directly from Sony with the PS3 Cell specs :) http://cell.scei.co.jp/index_e.html
On a sunny day (4 Apr 2007 05:43:42 -0700) it happened "Patrick Dubois"
<prdubois@gmail.com> wrote in
<1175690622.594433.213100@d57g2000hsg.googlegroups.com>:

>On Apr 3, 2:00 pm, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote: >> On a sunny day (3 Apr 2007 08:21:24 -0700) it happened "PatrickDubois" >> <prdub...@gmail.com> wrote in >> <1175613684.443163.290...@d57g2000hsg.googlegroups.com>: >> >> >On Apr 3, 8:59 am, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote: >> >> PS3 has only 1 power processor and _6_ SPE cores. >> >> http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit >> >> >Nope, 1 central PPC core and 8 Synergistic Processor Unit: >> >http://www.research.ibm.com/cell/heterogeneousCMP.html >> >> Nope, in the PS3 only 6 are available. > >Alright, I don't want to argue about this but I think we can fairly >say that the info on the web is not clear... >Just for fun, here's a link directly from Sony with the PS3 Cell >specs :) >http://cell.scei.co.jp/index_e.html
OK, all good and well, but here some facts: PS3 runs Linux in a 'hypervisor'. The hypervisor limits access to whatever Sony pleases to allow access too. One SPE is in use for the PS3 graphics, and no way you can touch it from Linux. The story goes IMB had yield problems, so Sony settled for chips with one working core less. That leaves 6 available from Linux. The wikipedia article is up to date and quite correct. Version of Linux that runs on PS3: Yellow dog Linux. If you are in Europe, there is a special C'T magazine release out with YD Linux for PS3 including some of the IBM development tools: https://www.heise.de/kiosk/special/ct/07/01/ All I can tell you now.