Automatic documentation of code

| -Uncategorized

In response to cmarguel’s blog entry: (which was probably a joke, but
I might as well go ahead… =) )

I think, however, that there this could have much more
potential. Can we make the program learn to recognize patterns? “This
program sorts an array.” “This program creates a socket whose port
number is the sum of two numbers.” Sure, it would be a weak program at
first, but imagine if it works well! A new generation of lazy
programmers would be born!

I attended this year’s natural language symposium at La Salle, and one
of the student groups proposed the exact same system. They’d written a
program that translated a C program to an English description. It was
a literal translation: “assign C to ….”, “if b is true then execute
block A, else execute block B. Start of block A… end of block A.
Start of block B… end of block B.”

For their thesis, they planned to make the program recognize common
algorithms such as swap, bubble sort and linear search. If students
can learn those in their first year of computing, shouldn’t a computer
be able to recognize those patterns with just a little more coding? In
fact, their project was even more ambitious. Given source code with
mistakes, their program was to recognize the attempted algorithm and
point out the errors in implementation.

The question-and-answer portion exposed the problems. Recognizing an
algorithm through source-code analysis is hard. Why? There are so many
different ways to write a bubble sort. Do you bubble the smallest
elements up, or bubble the largest elements down? Will you use two
loops? One loop? Loop going up? Loop going down? How do you do the
swap? The most promising approach would be to reduce the source code
to logical elements and then match it with a database of previous
checked answers, combining errors from several answers if necessary.

What about the literal translation of the program? Wouldn’t that
already help students understand their code better? Beginners who have
a hard time finding out the statements included in a block might be
able to use that kind of tool, but they eventually need to learn how
to indent code properly and how to read control structures. Besides,
they’d probably benefit more from a zoomable flowchart.

Documentation should not simply repeat what code already says. Rather,
documentation should make things clearer for users by answering
questions like “How do you use this function?” and “What do you need
to keep in mind when using this function?”. Comments in your source
code can also explain what other approaches you’ve tried, what traps
you need to avoid. Good documentation goes beyond code and shows us
the big picture.


Hmm. Hey, that zoomable flowchart idea looks cool. If people still
don’t have final projects by now, there’s a project idea for you… =)
If a visualizer for your favorite programming language already exists,
pick your next favorite one.

You can comment with Disqus or you can e-mail me at sacha@sachachua.com.