Data In, Data Out

When you start with programming, everything you do is challenging. Putting a window on screen, reading a file, obtaining data from a database, storing data in a database, it’s all new. But once you’ve been doing it for a couple of years you start noticing that a large part of what you do shows patterns. The most obvious one is that most applications are just a fancy way to add data to a database, retrieve data from a database and modify data in a database.

At first this might be exciting, but it gets less after your 100th INSERT statement. Particulary if you’re doing a web application. For each of your entities (person, board, post, message, …) you have to create an “Add …”, “Edit …” and “List …” screen. All of which validate the input and show an error if it’s wrong. It’s all not hard to do, but it’s so much of the same. It is so boring.

How to make this kind of thing more exciting? Well, you can’t make that more exciting, but can largely reduce the time you spend on doing this kind of boring stuff. Luckily boring stuff has the property of being much of the same. It being much of the same tells you that there’s some kind of pattern in there. Think of you writing code as executing a program, most of what it/you produce is the same but the parameters differ from time to time. Know what I’m getting at? Yes, it might be a good idea to write programs that generate programs. Yay, code generating tools!

What would be ideal is to have a tool that generates all the necessary code for you when a database schema is fed to it. It doesn’t have to be perfect, but the more of similar code you no longer have to reproduce, the better. This technique, shockingly enough, is called code generation. So, wouldn’t it be cool to write such a code generator for your project? Not only would this save a lot of time in the long run, it’s much more challenging that writing the same code over and over again also.

But how to go about doing it? First thing you have to do is figure out the patterns in your code. Write all the code necessary for two of the objects in your project (the thingies that you had to write add/edit/list screens for before) and see how much of their code overlap or can be changed to overlap. From this you can extract a kind of template of what a generic piece of code would look like. Then you have to write a parser for your database schema and generate all the files and code using that. Let’s look at a tiny example (of MySQL SQL code):

CREATE TABLE users (
userid INT(11) NOT NULL auto_increment,
username varchar(20) NOT NULL default '',
password varchar(32) NOT NULL default '',
icq int(11) NOT NULL default '0',
PRIMARY KEY (userid),
);

From this you can extract some information. For example that we’re dealing with the “users” table. We also extract the following fields:
* userid, which is a numeric value. This means we can generate code that validates if the value in this field is actually numeric. Additionally we could assume that all auto_incremented fields don’t have to be shown but are for internal use only.
* username, which is a string of maximum length 20. This is also something we can generate validating code for.
* password, we don’t know much about this, except that its maximum length is 32. It would be more useful if we’d add some meta data to this field, for example to mark it as being a password field. Meta data? Eek. No, it’s not scary at all, let’s replace the password SQL line with this one:

password varchar(32) NOT NULL default '', # type: password

Now our parser will be able to see that it’s a password field, from which it will know it will have to ask for the password a second time and check if the two passwords entered by the user are equal.
* icq, this once again is a numeric value, which we can validate.

So, using this information we can now generate HTML/PHP/ASP/JSP/whatever-you-user code. The cool thing is that when the generated code doesn’t work at once, you just change the code generator and regenerate the code again. Personally I find it very cool to have a little script generate a huge bulk of code.

Now you have a working code generator, you can edit the generated code to further suit your needs. It would be best to not edit the code itself, but rather the code around it (code that inherits from the generated code), but that’s not always possible. Find out what works best in your situation.

Code generation is a very powerful paradigm. It’s used a lot in compiler construction. If you want to write a parser for, for example, a new programming language, all you have to do is write down the syntax in a particular BNF-like format. Using this file as input a compiler generator (like “SableCC”:http://www.sablecc.org) will generate a lexer, parser and sometimes even an AST for you. You don’t have to know what those are exactly, but the important thing is that a lot of work is done for you. Work that otherwise you had to do by hand.

It’s all about tooling baby.