We (Illiac 6 research group at the University of Illinois) are currenlty working on porting one of the more popular face recognition programs for an application that is going to run on our "Communications Supercomputer", which involves a Virtex II-Pro FPGA. These is no benefit from implementing a processor in the FPGA and then running the existing C code on it, becuase there is no way it will come close to the performance of a processor in ASIC. Your only option of utilizing the FPGA for speedup is going to the roots of the algorithm(s), finding parallelism, and writing HDL code to exploit that parallelism. That is the phase we are currently in. Remember that an algorithm that may not be the best in a sequential environment may shine in a highly parallel environment, so it's best to look at all algorithms for possible parallel structure. ---Matthew Hicks> Peter, > > First of all, thanks for replying :-). > > We (team of 5) spent the last semester exploring and evaluating the > different algorithms. We came to the conclusion that PCA would be the > best compromise between complexity and accuracy, with the > recommendation that Neural Networks be added into it if possible, to > increase the accuracy even further. > > We did evaluate different implementations of the algorithm, some on > MATLAB and some running natively, and the performance was as we > expected: not suitable for real-time, especially for large face > databases. > > This only confirmed what was suggested at the beginning of our > research, which was the main reason we wanted to explore an FPGA > implementation to make up for the performance shortcomings while > maintaining (or better, increasing) the level of accuracy. > > Right now we have in fact divided up the work between us, and 3 of my > team are working on the algorithm (it being the main focus of the > project), while one is working on the neural network, and myself > working on the FPGA. > > What remains to be determined is the kind of design that the algorithm > will be implemented with. Whether it would be solely VHDL, a soft > microprocessor and asm/C/C++ code, or some combination of the two. > And, of course, whether the hardware currently at hand would support > that design. > > Also, after sending my earlier post, I stumbled upon a project on > opencores.org called Java Optimized Processor (http:// > www.opencores.com/projects.cgi/web/jop/overview), which is essentially > a soft processor for Java bytecode. I was wondering how good/robust/ > flexible it is, and whether someone here has actually used it in any > way on actual hardware. I might try to implement it and download it to > my FPGA, if I can get it to compile on Webpack 9.1i without trouble. > > I did check out the Xilinx University Program. The "Virtex-II Pro > Development System" seems like an extreme overkill in our case, since > the use of FPGA isn't standard curriculum in our faculty; we are > mainly doing this as a unique, single-case approach. It seems like an > excellent choice for an engineering faculty, though. Unfortunately, I > don't think our faculty (Computer Science) would be willing to make > such a purchase based on a single case requirement, especially taking > into account the relatively high currency exchange rate (1 USD ~= 5.7 > EGP). > > Thanks again for your response, and I hope I haven't bored you with my > long reply... > > Best Regards, > Islam Ossama > On Mar 31, 5:17 am, "Peter Alfke" <a...@sbcglobal.net> wrote: > >> Islam, >> If I were you, I would: >> first explore the best algorithm >> then see whether it can be implemented on a (any!) microprocessor, >> achieving reasonable performance. >> If the microprocessor implementation is too slow, I would look at a >> way to speed it up with an FPGA, and I would use an existing board of >> reasonable performance. >> The Xilinx University Program has a better board, based on Virtex- >> IIPro, that is surprisingly inexpensive for universities. >> I would contact the Xilinx University Program about details. >> Challenging project! >> Peter Alfke
Re: Help with a face recognition system
Started by ●March 31, 2007
Reply by ●March 31, 20072007-03-31
Matthew, Parallelism was also a factor in choosing PCA for implementation on FPGA, and Composite-PCA can even increase that parallelism. I think it's a very good point what you said about speed concerning a C implementation, which means we'll probably take your suggestion and do it entirely in VHDL. The part of the team working on the algorithm is already breaking it down into parallel parts; hopefully this would make the algorithm really "shine", as we definitely need it to. Also, for the sake of comparison, I'm thinking we can implement the same algorithm on a standard PC with threading and run it in real-time priority, and compare the results to see what was gained through the FPGA implementation. I'm sure the results would be interesting either way. And in response to Jim's question, this is more of a research project; so, naturally, we want it to support as large a database as possible. The plan is to keep testing it with databases of increasing size until we get it to reach the maximum size possible without breaking the real- time requirement. Thanks all for your responses... Best Regards, Islam Ossama On Mar 31, 9:23 am, Matthew Hicks <mdhic...@uiuc.edu> wrote:> We (Illiac 6 research group at the University of Illinois) are currenlty > working on porting one of the more popular face recognition programs for > an application that is going to run on our "Communications Supercomputer", > which involves a Virtex II-Pro FPGA. These is no benefit from implementing > a processor in the FPGA and then running the existing C code on it, becuase > there is no way it will come close to the performance of a processor in ASIC. > Your only option of utilizing the FPGA for speedup is going to the roots > of the algorithm(s), finding parallelism, and writing HDL code to exploit > that parallelism. That is the phase we are currently in. Remember that > an algorithm that may not be the best in a sequential environment may shine > in a highly parallel environment, so it's best to look at all algorithms > for possible parallel structure. > > ---Matthew Hicks
Reply by ●April 2, 20072007-04-02
Islam Ossama wrote:> Matthew, > > Parallelism was also a factor in choosing PCA for implementation on > FPGA, and Composite-PCA can even increase that parallelism. > > I think it's a very good point what you said about speed concerning a > C implementation, which means we'll probably take your suggestion and > do it entirely in VHDL. The part of the team working on the algorithm > is already breaking it down into parallel parts; hopefully this would > make the algorithm really "shine", as we definitely need it to. Also, > for the sake of comparison, I'm thinking we can implement the same > algorithm on a standard PC with threading and run it in real-time > priority, and compare the results to see what was gained through the > FPGA implementation. I'm sure the results would be interesting either > way.You could opt for an hybrid solution... do all the massively parallelizable things with FPGA fabric (after all, this is what FPGAs are all about when applied to high-speed processing) and do the more sequential/supervisory/etc. stuff on a CPU, preferably one of the PPC405 cores present in V2P and 4VFX FPGAs - these real on-chip CPU cores will provide far better performance than any soft-CPU you can possibly come up with, the only caveat is that you will only have two such CPUs available at most.
Reply by ●April 2, 20072007-04-02
Thanks for the suggestion, I'm seriously taking it into consideration. I already contacted the local Xilinx supplier and working out the details of getting the XUP board. I just hope I can live up to the level of this project, all this hardware stuff is new to me and I'm kinda starting to long for the comfort and warmth of software implementations and having the OS take care of all the dirty details for me. I guess that's why the idea of using the PPC processors would be attractive to me, though I'd still have to take care of some low-level details myself (unless I load a tiny linux kernel on one or both of the processors, maybe? hmmm, it'll definitely take some careful (re)thinking). Well, thanks again to everyone, your responses have all been extremely helpful. Best Regards, Islam Ossama On Apr 2, 7:29 am, "Daniel S." <digitalmastrmind_no_s...@hotmail.com> wrote:> > You could opt for an hybrid solution... do all the massively parallelizable > things with FPGA fabric (after all, this is what FPGAs are all about when > applied to high-speed processing) and do the more > sequential/supervisory/etc. stuff on a CPU, preferably one of the PPC405 > cores present in V2P and 4VFX FPGAs - these real on-chip CPU cores will > provide far better performance than any soft-CPU you can possibly come up > with, the only caveat is that you will only have two such CPUs available at > most.
Reply by ●April 2, 20072007-04-02
Honestly I think that if you have no hardware experience, this will be quite a challenge. Not to say that it's impossible but you're certainly going to have a few sleepness nights... In your case, I would rather suggest an all-software solution. One idea would be to use a PS3 and harness the power of the Cell processor (you can run linux on it I believe). The issue is to parallelize your algorithm enough to harness the power of the 9 cores (similar problem than if you were going to an FPGA solution). If one PS3 is not enough, maybe you could use 2, or 4... You could build a smaller cluster for not too much money. My 2 =A2. Patrick
Reply by ●April 3, 20072007-04-03
On a sunny day (2 Apr 2007 14:45:45 -0700) it happened "Patrick Dubois" <prdubois@gmail.com> wrote in <1175550345.009582.122910@y66g2000hsf.googlegroups.com>:>Honestly I think that if you have no hardware experience, this will be >quite a challenge. Not to say that it's impossible but you're >certainly going to have a few sleepness nights... > >In your case, I would rather suggest an all-software solution. One >idea would be to use a PS3 and harness the power of the Cell processor >(you can run linux on it I believe). The issue is to parallelize your >algorithm enough to harness the power of the 9 coresPS3 has only 1 power processor and _6_ SPE cores. http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit And it sucks 200W if fully loaded.
Reply by ●April 3, 20072007-04-03
On Apr 3, 8:59 am, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote:> PS3 has only 1 power processor and _6_ SPE cores. > http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unitNope, 1 central PPC core and 8 Synergistic Processor Unit: http://www.research.ibm.com/cell/heterogeneousCMP.html
Reply by ●April 3, 20072007-04-03
On a sunny day (3 Apr 2007 08:21:24 -0700) it happened "Patrick Dubois" <prdubois@gmail.com> wrote in <1175613684.443163.290410@d57g2000hsg.googlegroups.com>:>On Apr 3, 8:59 am, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote: >> PS3 has only 1 power processor and _6_ SPE cores. >> http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit > >Nope, 1 central PPC core and 8 Synergistic Processor Unit: >http://www.research.ibm.com/cell/heterogeneousCMP.htmlNope, in the PS3 only 6 are available.
Reply by ●April 4, 20072007-04-04
On Apr 3, 2:00 pm, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote:> On a sunny day (3 Apr 2007 08:21:24 -0700) it happened "PatrickDubois" > <prdub...@gmail.com> wrote in > <1175613684.443163.290...@d57g2000hsg.googlegroups.com>: > > >On Apr 3, 8:59 am, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote: > >> PS3 has only 1 power processor and _6_ SPE cores. > >> http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit > > >Nope, 1 central PPC core and 8 Synergistic Processor Unit: > >http://www.research.ibm.com/cell/heterogeneousCMP.html > > Nope, in the PS3 only 6 are available.Alright, I don't want to argue about this but I think we can fairly say that the info on the web is not clear... Just for fun, here's a link directly from Sony with the PS3 Cell specs :) http://cell.scei.co.jp/index_e.html
Reply by ●April 4, 20072007-04-04
On a sunny day (4 Apr 2007 05:43:42 -0700) it happened "Patrick Dubois" <prdubois@gmail.com> wrote in <1175690622.594433.213100@d57g2000hsg.googlegroups.com>:>On Apr 3, 2:00 pm, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote: >> On a sunny day (3 Apr 2007 08:21:24 -0700) it happened "PatrickDubois" >> <prdub...@gmail.com> wrote in >> <1175613684.443163.290...@d57g2000hsg.googlegroups.com>: >> >> >On Apr 3, 8:59 am, Jan Panteltje <pNaonStpealm...@yahoo.com> wrote: >> >> PS3 has only 1 power processor and _6_ SPE cores. >> >> http://en.wikipedia.org/wiki/PlayStation_3#Central_processing_unit >> >> >Nope, 1 central PPC core and 8 Synergistic Processor Unit: >> >http://www.research.ibm.com/cell/heterogeneousCMP.html >> >> Nope, in the PS3 only 6 are available. > >Alright, I don't want to argue about this but I think we can fairly >say that the info on the web is not clear... >Just for fun, here's a link directly from Sony with the PS3 Cell >specs :) >http://cell.scei.co.jp/index_e.htmlOK, all good and well, but here some facts: PS3 runs Linux in a 'hypervisor'. The hypervisor limits access to whatever Sony pleases to allow access too. One SPE is in use for the PS3 graphics, and no way you can touch it from Linux. The story goes IMB had yield problems, so Sony settled for chips with one working core less. That leaves 6 available from Linux. The wikipedia article is up to date and quite correct. Version of Linux that runs on PS3: Yellow dog Linux. If you are in Europe, there is a special C'T magazine release out with YD Linux for PS3 including some of the IBM development tools: https://www.heise.de/kiosk/special/ct/07/01/ All I can tell you now.





