Hi, I want to know how many fields Huffman encoding thoery are used. As I can counted, 1. In Window operating system, similar encoding is used to conpress text files; 2. Text compression in ZIP files; 3. JPEG fram compression; 4. MPEG format compression? which one uses it? Wireless communication? Any other applications? Thank you. Weng
Where are Huffman encoding applications?
Started by ●August 1, 2006
Reply by ●August 1, 20062006-08-01
Weng Tianxiang wrote:> Hi, > I want to know how many fields Huffman encoding thoery are used. > > As I can counted, > 1. In Window operating system, similar encoding is used to conpress > text files; > 2. Text compression in ZIP files; > 3. JPEG fram compression; > 4. MPEG format compression? which one uses it? > > Wireless communication? > > Any other applications? > > Thank you. > > WengI am sure in JPEG and MPEG.
Reply by ●August 1, 20062006-08-01
On 1 Aug 2006 04:02:31 -0700, "Weng Tianxiang" <wtxwtx@gmail.com> wrote in comp.programming:> Hi, > I want to know how many fields Huffman encoding thoery are used. > > As I can counted, > 1. In Window operating system, similar encoding is used to conpress > text files; > 2. Text compression in ZIP files; > 3. JPEG fram compression; > 4. MPEG format compression? which one uses it? > > Wireless communication? > > Any other applications? > > Thank you. > > WengFAX transmission, although the Huffman codes are predefined and not generated dynamically from the data. -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://c-faq.com/ comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
Reply by ●August 1, 20062006-08-01
Hi Jack, Fax application cannot be counted as full Huffman encoding. The reason is the statistics for characters are not dynamically collected. Thank you. Weng Jack Klein wrote:> On 1 Aug 2006 04:02:31 -0700, "Weng Tianxiang" <wtxwtx@gmail.com> > wrote in comp.programming: > > > Hi, > > I want to know how many fields Huffman encoding thoery are used. > > > > As I can counted, > > 1. In Window operating system, similar encoding is used to conpress > > text files; > > 2. Text compression in ZIP files; > > 3. JPEG fram compression; > > 4. MPEG format compression? which one uses it? > > > > Wireless communication? > > > > Any other applications? > > > > Thank you. > > > > Weng > > FAX transmission, although the Huffman codes are predefined and not > generated dynamically from the data. > > -- > Jack Klein > Home: http://JK-Technology.Com > FAQs for > comp.lang.c http://c-faq.com/ > comp.lang.c++ http://www.parashift.com/c++-faq-lite/ > alt.comp.lang.learn.c-c++ > http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
Reply by ●August 2, 20062006-08-02
> Fax application cannot be counted as full Huffman encoding. The reason > is the statistics for characters are not dynamically collected.Also for JPEG, etc. in most cases the predefined default Huffman-tables are used (Althought the file-format would support custom tables). I think the term "Huffman" is also widely used for such "static" applications. Regards, Thomas P.S.: I am wondering what the reason for your question is? www.entner-electronics.com
Reply by ●August 2, 20062006-08-02
Hi Thomas, When Huffman frequency table is available, the decoding procedure is stright forward. I know many electronic books use Huffman encoding method to compress the book full context. Usually in the first pass of full context, Huffman frequency table is established , in the 2nd pass, a Hash function is used to locate a new word entry, then encode, based on the Huffman frequency table, and output it. I want to know how Hash function is used in the 2nd pass. Any books or tips? Thank you. Weng Thomas Entner wrote:> > Fax application cannot be counted as full Huffman encoding. The reason > > is the statistics for characters are not dynamically collected. > > Also for JPEG, etc. in most cases the predefined default Huffman-tables are > used (Althought the file-format would support custom tables). I think the > term "Huffman" is also widely used for such "static" applications. > > Regards, > > Thomas > > P.S.: I am wondering what the reason for your question is? > > www.entner-electronics.com
Reply by ●August 2, 20062006-08-02
Weng Tianxiang wrote:> Hi Thomas, > When Huffman frequency table is available, the decoding procedure is > stright forward. > > I know many electronic books use Huffman encoding method to compress > the book full context. > > Usually in the first pass of full context, Huffman frequency table is > established , in the 2nd pass, a Hash function is used to locate a new > word entry, then encode, based on the Huffman frequency table, and > output it. > > I want to know how Hash function is used in the 2nd pass. >I think you are asking: how do I generate the huffman codes given the frequency of each symbol. If so look at: http://en.wikipedia.org/wiki/Huffman_coding in the section "basic technique". Cheers, Andy.
Reply by ●August 2, 20062006-08-02
Andy, No. I want to know the hash function. For example, I have a Huffman encoding table as follows: name Fre encode ... Andy 15, "011", Ray 20, "110", ... Now a text string "Andy Ray wrote" is incoming, I want to know how to match the incoming "Andy" and the table entry "Andy". One must establish a 1-to-1 relationship between the incoming word and the table entry. What I can imagine to do in software is to create a hash function, then if there are more entries responding to one entry, further string match must be made. I am wondering about whether or not a hardware design can does it. Thank you. Weng Andy Ray wrote:> Weng Tianxiang wrote: > > Hi Thomas, > > When Huffman frequency table is available, the decoding procedure is > > stright forward. > > > > I know many electronic books use Huffman encoding method to compress > > the book full context. > > > > Usually in the first pass of full context, Huffman frequency table is > > established , in the 2nd pass, a Hash function is used to locate a new > > word entry, then encode, based on the Huffman frequency table, and > > output it. > > > > I want to know how Hash function is used in the 2nd pass. > > > > > I think you are asking: how do I generate the huffman codes given the > frequency of each symbol. If so look at: > > http://en.wikipedia.org/wiki/Huffman_coding > > in the section "basic technique". > > Cheers, > > Andy.
Reply by ●August 2, 20062006-08-02
Weng Tianxiang wrote:> Andy, > No. I want to know the hash function. > > For example, I have a Huffman encoding table as follows: > > name Fre encode > ... > Andy 15, "011", > Ray 20, "110", > ... > > Now a text string "Andy Ray wrote" is incoming, I want to know how to > match the incoming "Andy" and the table entry "Andy". > > One must establish a 1-to-1 relationship between the incoming word and > the table entry. > > What I can imagine to do in software is to create a hash function, then > if there are more entries responding to one entry, further string match > must be made.A trie might be an option. I once had a conversation with someone who worked at a company that needed to match strings quickly in hardware and I believe they might've used a trie for that, although I could be remembering wrong since it was a few years ago. If you haven't encountered tries, think of them as deterministic finite-state automata shaped like a tree, where the root is the start state, and transitions (edges) from one node to the next correspond to symbols on your input. For example, if you want to recognize the strings "car", "cat", and "cup", your trie would look like this: (start)--->( )-+--->( )----->(end) c | u p | +--->( )-+--->(end) a | r | +--->(end) t Basically, you start at the start state, absorb a token and follow the edge associated with it, and repeat until you either don't have a matching edge or hit an end state. The nice thing about a trie is that you don't have to worry about collisions or worst-case behavior. The time to match is a linear function of the length of the string you're matching. A straightforward trie where every input symbol is a character (hence, probably 8 bits or maybe even larger) might be a little wasteful of space. But there are ways around that. The most obvious one is to say that the input is a string of bits and each bit is an input symbol. This makes your tree 8 times as deep as with 8-bit characters as input symbols, but it has the advantage that each node can have no more than 2 children (as compared to 256). A nice middle ground might be to make your input symbols pairs of bits or sequences of 4 bits. - Logan
Reply by ●August 2, 20062006-08-02
Hi Logan, Your method is interesting, but is far from what I am looking for. It is too slow. My target is 64-bit inputs on every clock and one must find its entry within 1-2 clocks and resolve entry conflicts if any at the same time. This design, if succeeded, can be widely used for any dynamic Huffman encoding. I think there is no such hardware design so far in the world and why I found that almost all Huffman encoding applications are static as in FAX, electronic books, ZIP files, JPEG, MPEG and so on. Thank you. Weng Logan Shaw wrote:> Weng Tianxiang wrote: > > Andy, > > No. I want to know the hash function. > > > > For example, I have a Huffman encoding table as follows: > > > > name Fre encode > > ... > > Andy 15, "011", > > Ray 20, "110", > > ... > > > > Now a text string "Andy Ray wrote" is incoming, I want to know how to > > match the incoming "Andy" and the table entry "Andy". > > > > One must establish a 1-to-1 relationship between the incoming word and > > the table entry. > > > > What I can imagine to do in software is to create a hash function, then > > if there are more entries responding to one entry, further string match > > must be made. > > A trie might be an option. I once had a conversation with someone who > worked at a company that needed to match strings quickly in hardware > and I believe they might've used a trie for that, although I could be > remembering wrong since it was a few years ago. > > If you haven't encountered tries, think of them as deterministic > finite-state automata shaped like a tree, where the root is the > start state, and transitions (edges) from one node to the next > correspond to symbols on your input. > > For example, if you want to recognize the strings "car", "cat", > and "cup", your trie would look like this: > > > (start)--->( )-+--->( )----->(end) > c | u p > | > +--->( )-+--->(end) > a | r > | > +--->(end) > t > > Basically, you start at the start state, absorb a token and follow > the edge associated with it, and repeat until you either don't have > a matching edge or hit an end state. > > The nice thing about a trie is that you don't have to worry about > collisions or worst-case behavior. The time to match is a linear > function of the length of the string you're matching. > > A straightforward trie where every input symbol is a character (hence, > probably 8 bits or maybe even larger) might be a little wasteful of > space. But there are ways around that. The most obvious one is to > say that the input is a string of bits and each bit is an input > symbol. This makes your tree 8 times as deep as with 8-bit characters > as input symbols, but it has the advantage that each node can have > no more than 2 children (as compared to 256). A nice middle ground > might be to make your input symbols pairs of bits or sequences of > 4 bits. > > - Logan






