Monday, September 8, 2003


I'm going to complain here about my honours data compression assignment. If there are any honours students that read this blog, doing cse457 who haven't handed in their assignment, they may benefit or sympathise.

I have just experienced something that occurs frequently in the IT industry. A program does not match it's documentation or specifications. This program was written in awk, which I would guess that none of the honours students are familiar with, and the assignment required us to modify it to analyse files to create a model for data compression (and find out the number of bits that file would take up).

The steps explained in the assignment sheet were very hard to understand, and the program we were provided with did not follow those steps. If we didn't know awk, how were we to tell? The output of the program is hideous, especially for special characters (\n \t etc).

But supposing we fixed the program and translated it into a language that we could understand (in this case perl because of it's parsing capabilities). The new program creates various distributions, computes entropies, and calculates the final number of bits that this scheme would use represent the analysed file in, all according to the assignment sheet (not following the supplied program).

However, the process itself is not accurate, because it assumes that the decoder (receiver) can 'mind-read' and find out what those distributions are before receiving the message! It doesn't take into account the number of bits needed to send the relative frequency distributions for each of the parts (hmmm... sounds like MML doesn't it). Stupid, stupid, stupid.

Now I have to deal with the last two questions, which again involves modifying provided code to compress binary images. I am desperately hoping the code is understandable and matches the documentation.

(Note - this assignment is the same as those from at least 1996 until now - nothing has changed).

