Cheating the LZW

Originally published on: Fri, 24 Apr 2009 00:10:27 +0000

In the early days of the web, many web pages sported hit-counters that would generate a graphical meter indicating the number of visitors who had viewed the page.

I thought that it would be a nice thing to have on my own site, so I set out to build my own.

I had decided to try using Perl to build my counter as it had become my language of choice when developing other CGI’s.

Some of the popular hit-counters would accept libraries of JPEG’s or GIF’s representing various kinds of digits and would then use the GD graphics library to fuse the appropriate digit images together into a single image.

I opted to take a different route.

I had decided to use the GIF format for the output image. I wanted a lossless format that could represent a small, monochrome bitmap containing six eight-by-sixteen-pixel characters.

If one has a binary character-set available, building a raw bitmap of a stream of digits is relatively painless. A monochrome bitmap eight by sixteen by two (colors) by 6 (characters) can be represented in 768 bits ( 96 bytes ) uncompressed. That’s pretty tiny!

Unfortunately, the GIF format did not permit a flag to indicate that the bitmap image would be an uncompressed bitmap. So, I was going to have to employ the LZW compression algorithm on this tiny blob of data. …or WAS I ?

The LZW algorithm uses a protocol of sorts that must be observed by both the compressor and the decompressor. An initial code-size must be agreed upon by both prior to compression or decompression. This code-size value indicates the bit-width for each code in the I/O stream. ( Like many compression algorithms, the LZW leverages variable-length bit I/O. )

The code-size must allow for two codes; clear and end-of-input.

I decided on a code-size of three bits and allowed the LZW special characters to push the code-size up to four bits … which is easier to handle in bit-shift operations. The clear-code would become eight and the EOI code would be nine as three bits can represent the numbers zero through seven.

For each bit in the generated bitmap, I wrote both a nybble with a value representing a single pixel, …a one or a zero followed by a nybble representing the clear-code.

The bit-image was emitted without any true compression. In fact, using the LZW as a protocol, the stream of raw data grew in size ( four bits to represent a monochrome pixel instead of one plus an extra four bits for each clear code. ) This process uses a byte to represent a single pixel when both nybbles are coupled.

At the end of stream, I needed to emit the EOI character, so I simple wrote a byte with the value 9*16, shifting the nine four pixels to the left so that the decompressor would stop processing the bitmap.

The Perl code below illustrates this technique by creating a GIF file named number.gif with the digits “123456” represented as a bitmapped image.

Sample digits

number

You might change the value of $count to something other than “123456”, but it must be six characters in length and must be comprised of all digits.

The generated image is always 805 bytes in size. 768 bytes for the bitmap, one for the EOI code, and thirty-six bytes for the GIF header and footer.

# License: MIT / X11
# Copyright (c) 2009 by James K. Lawless
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
@charset=(0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
56,24,124,124,12,254,56,254,124,124,
108,56,198,198,28,192,96,198,198,198,
198,120,6,6,60,192,192,6,198,198,
198,24,12,6,108,192,192,6,198,198,
214,24,24,60,204,252,252,12,124,126,
214,24,48,6,254,6,198,24,198,6,
198,24,96,6,12,6,198,48,198,6,
198,24,192,6,12,6,198,48,198,6,
108,24,198,198,12,198,198,48,198,12,
56,126,254,124,30,124,124,48,124,120,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0);

# "clear" code
$clr=8;

# end-of-input code in left-nybble of byte
$eoi=9*16;

# must be 6-digits
$count="123456";

open(FIL,">number.gif");
binmode(FIL);
# GIF 87 header
print FIL "GIF87a" ;
print FIL pack("CC",48,0); # Raster width 48
print FIL pack("CC",16,0); # Raster height 16

# Colormap, numbr bits color res -1, 0, bits per pixel - 1
# 1 000 0 000
print FIL pack("C",128);

# Background color index
print FIL pack("C",0);
# dummy byte
print FIL pack("C",0);

# Color Map
# Black
print FIL pack("CCC",0,0,0);

# White
print FIL pack("CCC",255,255,255);

# Image descriptor
# Separator
print FIL ',' ;

# Left
print FIL pack("CC",0,0);

# Top
print FIL pack("CC",0,0);

# Width
print FIL pack("CC",48,0);

# Height
print FIL pack("CC",16,0);

# Use global color map=0, sequential image = 0, 000, pixel -1 bpp
# 0 0 000 000
print FIL pack("C",0);

# Raster data
# Code size 3 ( LZW will use 4), this makes clear code 8!, EOI 9
print FIL pack("C",3);

# here's where we'll need to send the data out in blocks. It will be

$blkcount=0; # total blocks so far
$blkinprog=0; # block in progress

for($j=0;$j<16;$j++) {
  for($i=0;$i<6;$i++) {
    for($bits=128;$bits>=1;$bits=$bits/2) {
      # Determine bit settings
      $b=substr($count,$i,1)-'0';
      if( ($charset[$j*10+$b] & $bits) == $bits ) {
        $bitout = 16;
      }
      else {
        $bitout = 0;
      }
      if( $blkinprog == 0 ) {
        if($blkcount<3) {
          $blkinprog=240;
        }
        else {
          $blkinprog=49;
        }
        print FIL pack("C",$blkinprog);
        $blkcount++;
      }
      print FIL pack("C",$bitout+$clr);
      $blkinprog--;
    }
  }
}
print FIL pack("C",$eoi);

# Zero length for block
print FIL pack("C",0);
# Terminator
print FIL ";";

close(FIL);
Advertisements

About Jim Lawless

I've been programming computers for about 36 years ... 30 of that professionally. I've been a teacher, I've worked as a consultant, and have written articles here and there for publications like Dr. Dobbs Journal, The C/C++ Users Journal, Nuts and Volts, and others.
This entry was posted in Programming and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s