gif (LZW) patent expires tomorrow
The <anhref="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=4,464,650.WKU.&OS=PN/4,464,650&RS=PN/4,464,650”>patentnheld by Unisys on the Lempel-Ziv-Welch (LZW) compression schemenexpires tomorrow June 20th, 2003. This is a good thing since thencompression is used in the gif image format and the Unixncompress among other things.
Unisys probably deserves its place as the first whipping boy fornsoftware patents. Set aside the fact that an algorithm is not in andnof itself an implementation so it should not be patentable as anmachine or the matter that by granting such patents the Patent Officenpaves the way for broad and speculative land rush tactics (thinknAmazon, among others). Unisys irked a lot of people with what lookednlike a bald faced money grab.
A little background: Lempel and Ziv worked at Sperry and whilenthere were issued a patent for the LZ78 scheme. Welch had worked fornSperry and published his enhancements after leaving them but Sperrynwas issued a patent for the work. Sperry merged with Burroughs to formnUnisys in 1986. You can pull all this from some well crafted googlingnand the Unisys website.
What happened next is open to dispute. In 1987, Compuservenintroduced the gif format which made use of the LZW compressionnscheme. Either Compuserve didn't do its homework or Unisys turned anblind eye but the result was that gif came into common use and Unisysndid not enforce its patent regarding gif until 1993. That's whennpeople got angry- the world wide web was taking off and most imagesnwere in gif format, a format they assumed was free to use, but herenwas Unisys claiming that they were due for every encoder and decodernwritten. An interesting footnote, IBM also holds patents onncompression and, it's reputed, on earlier and later work thatninvalidates or overlaps those owned by Unisys. IBM seemed to choose tonignore the small fry. In any event, the timing couldn't have beennworse and Unisys got a well-deserved black eye among technicalnpeople.
The above linked patent, if you tried to read through it, describesnan implementation in hardware. The algorithm, first published in 1977,ncan be pretty easily explained:nn
Starting at the beginning of the data, look for unique strings. Ifn the string is new, write it to a table. Replace the string with an single character code representing the index to the table for thatn sequence in the output. Continue through the data matchingn strings, adding news ones to the table and outputing codes.
nnYou can express the above algorithm in fifteen or twenty lines ofnpseudocode.
Getting even farther afield… it naturally isn't quite that simplenin practice. For one thing, you want to only replace strings of morenthan two bytes in size- otherwise there wouldn't be any savings-nmaking the algorithm trivially more complicated. Another limitation isnthat the table is going to be of limited size- there are only so manynunique strings you can recognize so a largely random file is going tonblow your table and ruin the resulting compression. One more is thatnyou need to reserve a particular bit pattern as a flag that thenbyte(s) following it represent an entry in the table, so for somensequences you will end up expanding single bytes into multiples. Andnyou'll need to reserve a bit pattern to represent the end of thenfile. It's still reasonable to implement and if you do like a lot ofnimplementors, you use it with <anhref="http://www.nist.gov/dads/HTML/huffmanEncoding.html”>Huffmannencoding or RunnLength Encoding you can squeeze the data even further.