XMLTagsEditHistoryDiscussion (7)

  1. Introduction
  2. Results
  3. DNA to RNA
    1. Bugs during the contest
    2. DNA representation
      1. Structure
      2. BufferSrc
      3. Prepending information
      4. Optimize skips
    3. Protect
    4. Additional features
    5. Download
  4. RNA to image
    1. Features
    2. Download
  5. Looking for hints
  6. Draw polygons
    1. Generating mask of differences
    2. Generating histogram
    3. Generating filter for color with mask
    4. Drawing polygon
    5. Generating RNA to set the bucket to a given color
    6. Generating RNA sequences for polygon
    7. RNA → DNA
    8. Compressing DNA
    9. Verify results
    10. Rinse and repeat
  7. Members
  8. Tools
  9. Source code
  10. Conclusions
  11. Discussion
  12. Writeups of other teams

Introduction

This page documents the performance of team FUN in trying to save Endo in the ICFP programming contest in 2007.

The latest available version of this document is available at:

We are a group of friends that have always felt a little alien. That is, partially, why our group name is “Freaks Unidos”. We believe we may share a few genes with our Fuun friend: we're eagerly waiting until our DNA is sequenced to look for sequences we may share with Endo.

We had *lots* of fun during the contest. We encourage everyone to participate next year.

Results

We got to position 31. We are satisfied with this result, and slight improvement over last year.

The following is our final image, which we produce with a DNA prefix of length close to 16KB:

We couldn't get past the “1337” clue and we had to hack RNA directly. We wasted a lot of time trying to get the clues and failed miserably. We couldn't get even one of the repair guide pages out of Endo's DNA.

Yet we managed to get to position 31!

We made a DNA→RNA program that runs around 44K iterations per second on regular hardware. We also created a nifty technique to produce DNA prefixes that draw polygons of a certain color on top of the image for the “rotated towards the sun” prefix. And we have a way to compress repeated sequences of DNA.

Read on to know the details.

DNA to RNA

We implemented the DNA to RNA render as C code. It runs very fast: in a regular laptop (Mobile AMD Sempron(tm) Processor 3500+ 1800 Mhz) it takes 43 seconds to execute the 1891886 iterations required to generate the image from the empty prefix (around 44K iterations per second). It uses around 20M of memory during the entire execution.

It reads the prefix for the DNA from its standard input (and Endo's DNA from a static file) and produces the RNA to its standard output.

Of course, being a distributed team, we started working on multiple implementations. Alejo started in a Scheme implementation; it was the first one to work but it turned out to be very slow, evaluating an iteration in around 0.7 seconds. Nelson, Luis Felipe and Juan Fernando started working in a C version that they never finished. Sergio and Sebastián started working in a Common Lisp implementation. In the meantime, Alejo rewrote his implementation in C and made all the optimizations described, which he hadn't made to his Scheme implementation. Soon the team decided to focus on this implementation and the others were discarded.

Bugs during the contest

There was a bug in our implementation that we only managed to find after the contest, based on a trace for the self test image that Clive Gifford, from the K.I.S.S. team, kindly produced for us.

The self test image looks like this with our bug:

self-check

We tried hard to find the bug but it eluded us.

For a while we also suspected the RNA→image code (described below), but as it was much simpler, it seemed to us very unlikely to have bugs in the very little code it has that mentions the movement direction.

As you can see, it gets triggered between the “Length of empty match” ad the “Failed search”. We tried to verify these two cases (searching for an empty string and what happens when a search fails) and couldn't find the problem.

Interestingly, generating the image using Alejo's alternate 100% Scheme implementation (both the DNA → RNA and the RNA → image transformations) produced exactly the same problem.

The problem was triggered when you did an asnat of an element not in the environment: our previous version would just skip this, instead of encoding the length of the empty string. Alejo overlooked the specification and no one in the team managed to find the error.

Luckily, this restriction did not seem to affect us. We guess it didn't have anything to do with our failure to decode the hints and extract the pages from the repair guide.

DNA representation

Structure

To represent the DNA we used a singly linked list (actually it is a doubly-linked list but we only read it forwards; the reason it is a doubly-linked list is that we are reusing our list structure for all the lists our code needs, and in some parts, not for the DNA, we do need to traverse our list backwards). Each element in the list has a character pointer, a length and a pointer to the BufferSrc structure that we use for this buffer.

BufferSrc

The BufferSrc structures are used to keep track of all the buffers required by the program. The structures hold a pointer to the buffer and a counter of references:

typedef struct {
  char *buffer;
  int references;
} BufferSrc;

Prepending information

Using a BufferSrc allows us to very quickly prepend information to the DNA. To prepend information:

Optimize skips

We found that as DNA is executed, the list tends to grow and the performace tends to drop (as skips have to iterate over the list). To solve this, the implementation periodically checks to see if the avarage length of the buffers in the list is under a give threshold; if so, the whole DNA is copied into a new BufferSrc. We want to do this often enough that performance won't drop as the skips become slow, but not often enough that we end up spending all the time copying the bases around. We found that optimizing the DNA when the number of characters drops under 50000 was a good solution.

Protect

We assumed that we would see lots of protects with big l values (and assumption that we haven't verified) so we turned the iteration around:

That way we don't have to copy bases around L times, just once.

Of course, the case for l being zero is optimized and just prepends the string as described above, without copying any actual bases.

Additional features

Our implementation has some useful features:

Download

You can view the source code for our implementation here:

It includes the fix for the bug that we applied after the contest ended.

RNA to image

Our program that renders RNA to an image was also written in C. It reads the RNA from its standard input and produces a PNM image to its standard output. It takes 2.37 seconds to generate the image resulting from the empty DNA prefix.

The implementation was straightforward. We turned fill/tryfill into iterative versions because we were getting Stack overflows, but they were probably caused by bugs in our code.

The RNA to image render was implemented by Alejo and Sebastián originally in Scheme. However, Alejo decided to rewrite it in C to speed up things (the originally implementation took a long time to write the final image!).

Features

It has some useful features:

Download

You can download the source code for our RNA → image program from:

Looking for hints

We got the the image that you get with the empty prefix, and it had two prefixes. It allowed us to know that there was a manual for repairing endo, and with the second prefix we got the image that we used as a base for the process we are about to describe.

We tried to get the manual pages, and we generated all the pages. Well, we tried to.

We noticed that some wouldn't end rendering, and we thought it was a bug, because we also thought it was some kind of infinite loop. We are yet to confirm that those programs will indeed end. We even documented which pages would lead to the thought-to-be endless loop. But we never ran them again.

We also noticed that some teams submitted images with a similar risk, but with a prefix of length 27. And we found it with simple brute force (not much), by omitting one character each time. There were just 19 different prefixes and when we evaluated them with our implementation we noticed that it was identical and sumbitted it. This was done by Nelson, alone.

Alejo also generated the the images that you get with all possible strings of length 6 of less. He didn't find anything useful.

We used MD5 hashes to keep track of differences in the images (stored as PNG files).

Draw polygons

Around 8 hours before the end of the contest, with only Luis Felipe, Juan Felipe, Nelson and Alejo available, it became obvious that looking for hints from the manual was leading us nowhere. We decided to start exploring another approach, that of drawing directly over the image generated by the towards-the-sun prefix (“IIPIFFCPICPCIICICIICIPPPPIIC”).

In order to do this we had to solve several problems and combine the pieces.

Generating mask of differences

First we made a simple program that, given two images in PNM format, generates a third with the mask of differences. The following is one such image, that of the differences between the target image and the towards-sun image (white pixels are those where the color doesn't match):

differences-target-towards-sun

The image also informs us of the number of pixels that don't match, which we used to determine, given a prefix of a certain length, whether it decreased or increased the risk. In the above image there are 119.456 different pixels.

The source code we used to generate this, implemented by Alejo, is available here:

Both images need to be in PNM's P6 format.

Generating histogram

Then, given a certain mask produced by the above program, we created a program that reads it along with the original image and prints the histogram of the colors occuring after the mask is applied to the image.

For example, the following are the first 10 colors it prints with the above mask and the target image:

(212, 246, 255): 4549
(226, 132, 50): 4163
(79, 184, 56): 2891
(7, 146, 32): 2700
(217, 217, 218): 2364
(226, 132, 49): 2122
(220, 220, 220): 2109
(226, 132, 52): 2038
(209, 243, 253): 2029
(78, 181, 56): 1935

This would allow us to decide which were the colors worth optimizing the most.

The source code implementing this, programmed by Alejo, is available here:

One restriction is that the target image needs to be in PNM's P6 format and the mask in PNM's P5.

Generating filter for color with mask

Now that we have candidates for polygons, we need some procedure to know where in the target image they occur. To do this we created the mask-color program. It reads the target image and the mask of differences (generated by the find_mask program as described above) with the current image and outputs an image where a pixel is white if it is included in the mask and it has the color specified in the target image.

We combine this image with the original mask to produce a PNG with three pixels:

The following is the image generated by the program for the most common color, according to the histogram above (note that, although your browser probably doesn't show it, this image *has* transparent pixels, lots of them). This color, as you can see, corresponds to many pixels in the clouds:

most-common-color

The code to do this was written by Alejo and is available at:

Drawing polygon

Now, based in an image such as the one above, we would create a sequence of dots that would produce a polygon. We will draw the contourn of the polygon and fill it with the color.

Our goal was:

To do this we used The GIMP, mostly. For the cloud above we produced the following sequence of points:

(356 100) (341 89) (361 32) (448 31) (470 88) (450 100)

For each polygon we also produce one inner point, close to the first point in the polygon. In this case we produce the following:

(356 99)

Generating RNA to set the bucket to a given color

Now we have another problem: we know the color of the polygon (from the histogram) but we need to generate a sequence of valid AddColor RNA instructions such that the averages in the bucket match the desired color.

This turned out to be relatively easy. This was solved independently both by Alejo and Luis Felipe. We used Scheme code for this:

; Average the integers in list c:

(define (average c)
  (quotient (fold + 0 c) (length c)))

; Find a list of values 0 or 255 such that their average is color,
; given that the list already includes the values in current:

(define (find-one color current)
  (cond
    ((= color 255) (list 255))
    ((= color 0) (list 0))
    ((null? current) (find-one color (list 0 255)))
    ((< (average current) color)
     (find-one color (cons 255 current)))
    ((> (average current) color)
     (find-one color (cons 0 current)))
    (else current)))

; Given a list of RGB values, produce a list of standard colors
; such that their average produces the desired color.
; We sort the list so equal operations will be next to each
; other (in order to compress the DNA):

(define (colors col)
  (let ((orig (map (cut find-one <> '()) col)))
    (sort
      (let loop ((current orig))
        (cons (map car current)
              (if (every (compose null? cdr) current)
                '()
                (loop (map (lambda (x orig)
                             (if (null? (cdr x))
                               orig
                               (cdr x)))
                           current
                           orig)))))
      (key< (lambda (c) (+ (* 255 255 (car c)) (* 255 (cadr c)) (caddr c)))))))

; Print the RNA operations that set the bucket to the state
; described by the list of colors col:

(define (color->rna col)
  (format #t "PIIPICP empty bucket~%")
  (for-each
    (lambda (col)
      (format #t "PIPII~A add color ~A~%"
              (cond
                ((equal? col '(0 0 0)) "IC")
                ((equal? col '(255 0 0)) "IP")
                ((equal? col '(0 255 0)) "CC")
                ((equal? col '(255 255 0)) "CF")
                ((equal? col '(0 0 255)) "CP")
                ((equal? col '(255 0 255)) "FC")
                ((equal? col '(0 255 255)) "FF")
                ((equal? col '(255 255 255)) "PC")
                (else (error "Invalid color" col)))
              col))
    col))

We don't care about transparency: we always draw fully opaque polygons.

Based on the code above, you can print the RNA that needs to be executed to set the current color:

(color->rna (colors '(212 246 255)))

Of course, the length of the resulting list is the least common multiple of the lengths of the lists produced by the find-one function.

While we suspected that it was possible to optimize this further to produce shorter sequences with more repetitions (for example, fold a “red”, “green” and “blue” into a “while”), we didn't explore this.

Generating RNA sequences for polygon

Now, given the sequence of points above, we need to produce a sequence of RNA instructions that will draw lines connecting them (including the last and the first one), move to the internal pixel and do a fill. The specific sequence is as follows:

The move between points is optimized to minimize the number of turns (this will allow us to compress the DNA sequence further, as described below).

We are assuming that this code will be executed *after* the towards-the-sun RNA code has been executed. In order to know the actual sequences, we need to know what the final state is, right after towards-the-sun has finished executing (ie. the current direction and position). Luckily, our RNA → image implementation shows this.

The following is the Scheme code we used, which Alejo wrote. It can obviously be optimized for readability (rna-moves-x and rna-moves-y should probably be folded into one function):

(use format-modular srfi-1 orders)

(define *dir* 'e)
(define *pos* '(0 595))
(define *mark* '(0 0))

; Produce the steps to draw polygon seq and fill it in the point
; inner:

(define (draw-polygon seq inner)
  (rna-moves (car seq))
  (new-bitmap)
  (let loop ((s (cdr seq)))
    (cond
      ((null? s)
       (set-mark)
       ; return to start
       (rna-moves (car seq))
       (line)
       (rna-moves inner)
       (tryfill)
       (op-compose))
      (else
        (set-mark)
        (rna-moves (car s))
        (line)
        (loop (cdr s))))))

(define (rna-moves to)
  ; optimize order of moves
  (cond
    ((or (and (> (car to)  (car  *pos*)) (eq? *dir* 'e))
         (and (> (cadr to) (cadr *pos*)) (eq? *dir* 'n))
         (and (< (cadr to) (cadr *pos*)) (eq? *dir* 's)))
     (rna-moves-x (car to))
     (rna-moves-y (cadr to)))
    (else
     (rna-moves-y (cadr to))
     (rna-moves-x (car to)))))

(define (rna-moves-x dst)
  (cond
    ((or (and (> dst (car *pos*)) (eq? *dir* 'w))
         (and (< dst (car *pos*)) (eq? *dir* 'e)))
     (turn-clockwise)
     (turn-clockwise))
    ((or (and (> dst (car *pos*)) (eq? *dir* 'n))
         (and (< dst (car *pos*)) (eq? *dir* 's)))
     (turn-clockwise))
    ((or (and (> dst (car *pos*)) (eq? *dir* 's))
         (and (< dst (car *pos*)) (eq? *dir* 'n)))
     (turn-counter-clockwise)))
  (let loop ()
    (unless (= dst (car *pos*))
      (move)
      (loop))))

(define (rna-moves-y dst)
  (cond
    ((or (and (> dst (cadr *pos*)) (eq? *dir* 'n))
         (and (< dst (cadr *pos*)) (eq? *dir* 's)))
     (turn-clockwise)
     (turn-clockwise))
    ((or (and (> dst (cadr *pos*)) (eq? *dir* 'e))
         (and (< dst (cadr *pos*)) (eq? *dir* 'w)))
     (turn-clockwise))
    ((or (and (> dst (cadr *pos*)) (eq? *dir* 'w))
         (and (< dst (cadr *pos*)) (eq? *dir* 'e)))
     (turn-counter-clockwise)))
  (let loop ()
    (unless (= dst (cadr *pos*))
      (move)
      (loop))))

(define (new-bitmap)
  (format #t "PCCPFFP add bitmap~%"))

(define (op-compose)
  (format #t "PFFPCCP compose~%"))

(define (turn-clockwise)
  (set! *dir* (case *dir* ((n) 'e) ((e) 's) ((s) 'w) ((w) 'n)))
  (format #t "PFFFFFP turn clockwise: ~A~%" *dir*))

(define (turn-counter-clockwise)
  (set! *dir* (case *dir* ((n) 'w) ((e) 'n) ((s) 'e) ((w) 's)))
  (format #t "PCCCCCP turn counter clockwise: ~A~%" *dir*))

(define (move)
  (case *dir*

    ((n)
     (set-car! (cdr *pos*) (- (cadr *pos*) 1)))

    ((e)
     (set-car! *pos* (+ (car *pos*) 1)))

    ((s)
     (set-car! (cdr *pos*) (+ (cadr *pos*) 1)))

    ((w)
     (set-car! *pos* (- (car *pos*) 1))))
  (format #t "PIIIIIP new position is ~A (dir ~A)~%" *pos* *dir*))

(define (set-mark)
  (set! *mark* (apply list *pos*)) ; fresh copy
  (format #t "PCCIFFP set mark at ~A~%" *mark*))

(define (line)
  (format #t "PFFICCP line from ~A to ~A~%" *mark* *pos*))

(define (tryfill)
  (format #t "PIIPIIP fill at ~A~%" *pos*))

With this code we can produce the sequence of steps for our above polygon with the following expresion:

(draw-polygon 
  '((356 100) (341 89) (361 32) (448 31) (470 88) (450 100))
  '(356 99))

RNA → DNA

Now we need to convert the RNA back to its associated DNA sequence and, very important, make sure it gets executed *after* towards-the-sun has been executed. That is, we need to produce a prefix that includes a DNA sequence encoding the RNA code for our polygon and color and we need to make it control the machine to run towards-the-sun and, only once it has finished executing, the DNA producing our RNA.

This turns out to be easy to do. For every RNA sequence, we simply prepend it with “III” to produce a DNA code that, when executed, will output our original RNA. Call this sequence OPERATIONS.

Our prefix is the concatenation of PATTERNS, TEMPLATES, OPERATIONS, TAIL and SUN defined as follows:

TAIL
The harmless string ICFPICFPICFP, which doesn't occur anywhere else in our prefix.
PATTERN
Encodes the patterns (?TAIL)(?"IFPICFPPCIFFPIFPICFPPCIFP"), followed by a IFC to finish the patterns. The first set of parenthesis will match OPERATIONS, right until the end. The second searches for the last bases occuring in Endo's DNA.
TEMPLATE
Encodes the patterns \1\0, followed by a IFC to finish the templates.
SUN
The prefix that produces the “towards-the-sun” image.

Note that after the first iteration the resulting DNA will be the concatenation of:

As such, right after Endo's DNA has produced all the RNA for generating the rotated sun image, it will pass control to OPERATIONS, which will produce our desired RNA sequences.

Compressing DNA

We soon discovered that the OPERATIONS sequence was growing a lot. For example, we frequently had to perform between 100 and 300 different AddColor operations; we also had lots of long chains of Move operations. Since we needed 10 DNA bases for each RNA instruction, we need to correct at least one pixel for every operation. While this is doable for a few large polygons of homogeneous colors (such as, of course, the clouds), the risk usually grows.

We quickly discovered a way to compress repeated sequences in the DNA, using a pattern to match the big sequence and a template to insert it back to the DNA multiple times. This works very well precissely because we tend to have a lot of identical operations in sequence (and we can, and indeed did, reorder some, for example the AddColor operations, to increase the amount of compression).

Of course, the compression is recursive. For example, if you have 25 consecutive “move” operations, a simplified version of our compress algorithm would do this:

  compress ( “move”, 25 )
= “move” : compress ( “move”, 24 )
= “move” : duplicate( compress ( “move”, 12 ) )
= “move” : duplicate( duplicate ( compress ( “move”, 6 ) ) )
= ...

Applying this rule greatly reduced the length of our prefixes.

The “duplicate” rule works simply:

In practice, the “compress” function was more sophisticated than that. For example, sometimes it “triplicated” or “quadruplicated” the string (by including more occurences of the template \0).

For example, a sequence of 200 “move” operations required 2000 DNA bases without compression (200 IIIPIIIIIIIP sequences). Our compressor successfully compresses it down to 142 bases, 14% of its size:

IIPIPIIIICICPIICIICIFPPIFPPIFPPIFPPIFPPIFPPIFPPIFPPIFPPIFPPIICIIPIPIICI
CPIICIICIFPPIFPPIFPPIFPPIFPPIFPPIFPPIFPPIFPPIFPPIICIIIPIIIIIPIIIPIIIIIP

The algorithm for this was designed by Alejo, Nelson and Luis Felipe, and implemented in C by Alejo. We think it can still be tweaked further. It is available here:

Verify results

With all the above we can draw a polygon over the clouds having its most common color in the target image:

clouds-polygon

We use find_mask to see if the improvement in the number of pixels that match the target times 10 is less than the increase in the size of the DNA and decide whether the polygon is worth drawing.

Rinse and repeat

Now we need to repeat this process:

Note that most of the programs read information from standard input and write to standard output. As such, we often found ourselves doing long commands such as:

$ csi -s seq-to-rna.scm | ./rna-to-dna | ./dna-to-rna | \
   ./rna-to-image | ./pnmtopng >result.png

In the end, we run out of time while drawing polygons. We draw the following (which you can see in our final image):

Members

The following are the members of team FUN, with links to their blogs and to posts they've made in their blogs about the contest:

Tools

We used the following tools:

Source code

The following is the code we used:

We also used some Scheme code which we included verbatim above in this document.

Conclusions

We had plenty of fun with the contest :-)

It was a very interesting challenge, generally very well conceived. We only regret that the “1337 puzzle” was somewhat cryptic and stopped us from finding the remaining puzzles.

The following things worked very well for us:

The following are things that we think we should improve:

Discussion

Writeups of other teams

TODO: Add more teams.

Last update: 2008-03-27 (Rev 13877)

svnwiki $Rev: 12966 $