FPGARelated.com
Forums

GZIP algorithm in FPGA

Started by lenz February 17, 2004
Hi,

i tried to find some references for lossless compression
algorithms implemented in fpga. I found a lot about 
bitstream compression but (nearly) nothing about fpga 
implementations of lossless compression algorithms.

Which lossless compression algorithms are suited for fpga 
implementation.


Thank you.

lenz wrote:

>Hi, > >i tried to find some references for lossless compression >algorithms implemented in fpga. I found a lot about >bitstream compression but (nearly) nothing about fpga >implementations of lossless compression algorithms. > >Which lossless compression algorithms are suited for fpga >implementation. > >
Some years ago I had a need for a losslessly compressed database, and found some source code for the Lempel-Ziv-Welch compression program (LZW). I extracted the core code from it and inserted it into the database read-write program, and got it working. I didn't dwell on the details, but it looked pretty simple to me. It built a 'dictionary' of multi-byte patters it had seen, and assigned single-byte and then later 9, 10, 11, 12-etc. bit codes for these patterns. These functions seem like they could be done with an FPGA and an external RAM to hold the dictionary. (The dictionary is written in the output stream one entry at a time during compression, and then rebuilt when decompressing. You should be able to find some source code for LZW somewhere. Jon
lenz19@gmx.de (lenz) wrote in message news:<7f673150.0402171059.2cf41c85@posting.google.com>...
> i tried to find some references for lossless compression > algorithms implemented in fpga. I found a lot about > bitstream compression but (nearly) nothing about fpga > implementations of lossless compression algorithms. > > Which lossless compression algorithms are suited for fpga > implementation.
Since you mentioned GZIP, I will start with Lempel-Ziv (LZ) based compression schemes: Huang, Saxena and McCluskey "A Reliable Compressor on Reconfigurable Coprocessors" http://csdl.computer.org/comp/proceedings/fccm/2000/0871/00/08710249abs.htm Jung and Burleson. "Efficient VLSI for Lempel-Ziv Compression in Wireless Data Communication Networks" http://citeseer.nj.nec.com/359751.html It should be noted that the Systolic Array implementatio described in the papers would violate the Stac patents. Basically Stac has patented the notion of using a shift register to store the history buffer. Though I personally believe the concept of storing the dictionary in a shift register to be rather obvious... we have opted not to use a shift register implementation so as to not risk litigation. We are in the process of completing a LZ based compression core that is very similar to LZS, but does not violate the patents. The performance in a Xilinx V2Pro is expected to be on the order of 100M bytes/second, with a 512 byte dictionary. We expect to release this core for Xilinx v2/v2pro/s3 and Altera Stratix by the end of March. These papers should atleast get you started. Large CAMs are not a particularly FPGA friendly structure, but similar structures can be effectively implemented. For decompression, sequential implementations will be just as fast as parallel ones, as on decompression only one location from the dictionary needs to be read per output. For compression, a parallel implementation will allow for processing up to 100M tokens per second. A sequential approach will be much slower. LZ compression is a very good approach for "generic" data, in areas such as storage or networking. Since you did not mention what sort of data you were going to compress, I will make brief mention of non-LZ approaches. IBM's ALDC is something worth studying to see an alternative approach to compression. As far as patents go... the adaptive nature of the algorithm is really quite novel, and dare I say mathematically quite cool. If it is images, one is much better off going with an algorithm that is tailored to this sort of data-set. For example LOCO/JPEG-LS for continuous tone images. We are wrapping up an implementation of this algorithm, and our first hardware implementation is running at 35M 12bit pixels/sec using ~65% of an XC2v1000-4. The challenge with this algorithm is in the pipelining, as each pixel prediction, error calculation, and statistics update, must be completed prior to processing the next pixel. We expect the final implementation to be quite a bit faster and slightly smaller. One of the challenges faced implementing compression algorithms in hardware is that the algorithms are most often specified in the standards documents in C-pseudocode. So implementation often requires translating a substantially serial implementation into a pipelineable design. The more significant challenge with implementing these algorithms in hardware is that each pixel/token, is completely dependent upon the error and updated statistics from the previous pixel/token. If you can identify what this minimal loop is, and put all other logic into pipelines before and after this loop, your performance will be limited solely by the minimal loop. Most of these algorithms look at minimizing the total compute required, rather than minimizing the complexity of this minimal loop. The former will dictate software performance, the later hardware, in optimal implementations. As an example JPEG-LS in near lossless mode is only marginal slower in software, but about 50% as fast when implented in hardware, each as compared to JPEG-LS in lossless mode. I hope these comments atleast offer a brief introduction. I also hope that my comments were not too commercial in nature, having mentioned two cores that we are getting ready to release. Regards, Erik Widding. --- Birger Engineering, Inc. -------------------------------- 617.695.9233 100 Boylston St #1070; Boston, MA 02116 -------- http://www.birger.com
Hello,

lenz19@gmx.de (lenz) writes:

> Which lossless compression algorithms are suited for fpga > implementation.
If you know in advance information on the type of data (e.g. english language) then a Huffman Coding could be a quick and small solution. It could be e.g. easily done with Lookuptable. Florian -- int m,u,e=0;float l,_,I;main(){for(;1840-e;putchar((++e>907&&942>e?61-m:u) ["\t#*fg-pa.vwCh`lwp-e+#h`lwP##mbjqloE"]^3))for(u=_=l=0;79-(m=e%80)&& I*l+_*_<6&&26-++u;_=2*l*_+e/80*.09-1,l=I)I=l*l-_*_-2+m/27.;}
"Florian-Wolfgang Stock" <f.stock@tu-bs.de> wrote in message
news:40334504$0$17577$9b4e6d93@newsread4.arcor-online.net...

> int m,u,e=0;float l,_,I;main(){for(;1840-e;putchar((++e>907&&942>e?61-m:u) > ["\t#*fg-pa.vwCh`lwp-e+#h`lwP##mbjqloE"]^3))for(u=_=l=0;79-(m=e%80)&& > I*l+_*_<6&&26-++u;_=2*l*_+e/80*.09-1,l=I)I=l*l-_*_-2+m/27.;}
Nice piece of code and it even compiles and works :) I am not sure I fully understand the output though.. :) /Mikhail -- To reply directly: matusov at square peg ca (join the domain name in one word and add a dot before "ca")
"MM" <mbmsv@yahoo.com> writes:

> "Florian-Wolfgang Stock" <f.stock@tu-bs.de> wrote in message > news:40334504$0$17577$9b4e6d93@newsread4.arcor-online.net... > >> int m,u,e=0;float l,_,I;main(){for(;1840-e;putchar((++e>907&&942>e?61-m:u) >> ["\t#*fg-pa.vwCh`lwp-e+#h`lwP##mbjqloE"]^3))for(u=_=l=0;79-(m=e%80)&& >> I*l+_*_<6&&26-++u;_=2*l*_+e/80*.09-1,l=I)I=l*l-_*_-2+m/27.;} > > Nice piece of code and it even compiles and works :) I am not sure I fully > understand the output though.. :)
the part with the email address is obvious, the thing around is called mandelbrot-set (in german also called "Apfelmaennchen") - And no, I havnt implemented it yet on a fpga (but its just some complex calculation, so it should not be difficult :) Florian -- int m,u,e=0;float l,_,I;main(){for(;1840-e;putchar((++e>907&&942>e?61-m:u) ["\t#*fg-pa.vwCh`lwp-e+#h`lwP##mbjqloE"]^3))for(u=_=l=0;79-(m=e%80)&& I*l+_*_<6&&26-++u;_=2*l*_+e/80*.09-1,l=I)I=l*l-_*_-2+m/27.;}
> We are in the process of completing a LZ based compression core that > is very similar to LZS, but does not violate the patents. The > performance in a Xilinx V2Pro is expected to be on the order of 100M > bytes/second, with a 512 byte dictionary. We expect to release this > core for Xilinx v2/v2pro/s3 and Altera Stratix by the end of March.
Good info Erik. How does the above performance compare to a 2 or 3 GHz 64 bit processor? Steve
Florian-Wolfgang Stock <f.stock@tu-bs.de> wrote in message news:<40334504$0$17577$9b4e6d93@newsread4.arcor-online.net>...
> Hello, > > lenz19@gmx.de (lenz) writes: > > > Which lossless compression algorithms are suited for fpga > > implementation. > > If you know in advance information on the type of data (e.g. english > language) then a Huffman Coding could be a quick and small > solution. It could be e.g. easily done with Lookuptable.
Here is an implementation from 2000. http://www.sulimma.de/prak/ss00/projekte/huffman/Huffman.html It was a student project for my lab course. Kolja Sulimma
news@sulimma.de (Kolja Sulimma) writes:

> Florian-Wolfgang Stock <f.stock@tu-bs.de> wrote in message > news:<40334504$0$17577$9b4e6d93@newsread4.arcor-online.net>... >> Hello, >> >> lenz19@gmx.de (lenz) writes: >> >> > Which lossless compression algorithms are suited for fpga >> > implementation. >> >> If you know in advance information on the type of data (e.g. english >> language) then a Huffman Coding could be a quick and small >> solution. It could be e.g. easily done with Lookuptable. > > Here is an implementation from 2000. > http://www.sulimma.de/prak/ss00/projekte/huffman/Huffman.html > It was a student project for my lab course.
Exactly something like that I had in mind as I talked from it. The Restriction with with the advance information could be circumvent by dynamic generating the Lookuptable (here is the drawback, that you need 2 Passes over the stuff you want to code). Florian -- int m,u,e=0;float l,_,I;main(){for(;1840-e;putchar((++e>907&&942>e?61-m:u) ["\t#*fg-pa.vwCh`lwp-e+#h`lwP##mbjqloE"]^3))for(u=_=l=0;79-(m=e%80)&& I*l+_*_<6&&26-++u;_=2*l*_+e/80*.09-1,l=I)I=l*l-_*_-2+m/27.;}
Florian-Wolfgang Stock wrote:
> > news@sulimma.de (Kolja Sulimma) writes: > > > Florian-Wolfgang Stock <f.stock@tu-bs.de> wrote in message > > news:<40334504$0$17577$9b4e6d93@newsread4.arcor-online.net>... > >> Hello, > >> > >> lenz19@gmx.de (lenz) writes: > >> > >> > Which lossless compression algorithms are suited for fpga > >> > implementation. > >> > >> If you know in advance information on the type of data (e.g. english > >> language) then a Huffman Coding could be a quick and small > >> solution. It could be e.g. easily done with Lookuptable. > > > > Here is an implementation from 2000. > > http://www.sulimma.de/prak/ss00/projekte/huffman/Huffman.html > > It was a student project for my lab course. > > Exactly something like that I had in mind as I talked from it. The > Restriction with with the advance information could be circumvent by > dynamic generating the Lookuptable (here is the drawback, that you > need 2 Passes over the stuff you want to code).
Isn't there a variation that generates the table on the fly? I don't know any details, but wouldn't they have this same problem in modem compression, you can only see the data once? Or do they buffer up a block before compressing? Back in the early days, an engineer talked to me about the possibility of patenting a method that worked like this. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX