Code Generation and Vendor Lock-In

When you build a code generator you have two basic options:

  1. Generate code to be read, complemented and possibly modified by humans
  2. Generate code purely as a convenient intermediate step toward bytecode/machine code compilation

The first approach seems to be the most common. It is the most pragmatic option. “Hey, I keep writing the same code over and over, can’t I simply generate part of it and make minor adjustments by hand?” Yes you can. But then you end up with a maintenance issue: it turns out that the code you generated initially was not quite right, and now what do you do, regenerate the code and lose all the modifications you made? As the kids say these days: FAIL!

An improved version of this naive approach is using the generation gap pattern. The idea here is to generate abstract classes, which you extend from custom code and override the parts that you need to override. The result is that you keep generated and manually written code separate, which is a good thing, because you can make changes to the code generator and simply regenerate code without your changes being lost. Usually. Not always, because if you make invasive changes to you code generator, you may generate completely different code altogether; different classes, different methods and so on. Although you do not lose your manually written code, this manually written code now no longer has any apparent relationship to the code that is generated and it needs to be rewritten to fit the new style of generated code. Again: FAIL.

These problems led us, in the MoDSE research project (pronounced “modes”), to choose approach 2: generate code only as an intermediate step. This also means we have to do 100% code generation, we hardly mix custom and generated code. And in rare edge cases that we have to, we do only through well-defined fixed interfaces. One code generator we built using this approach is WebDSL. After you invoke the compiler on your WebDSL program you do not look at the generated code.

Now, let’s say you started your own software company and you got your first big customer. Congratulations. You’re going to build the website of a large international corporation. And because you want to be productive and cool ‘n stuff you’re using code generation techniques. Since you were so impressed with the arguments you just read against mixing custom and generated code, you decide to generate 100% of your code, and therefore no longer have to focus on extension and modification techniques. Thou shalt not read generated code. Good for you! You may even choose using WebDSL. Even better.

But what about your customer? What if you deliver your product? Either you deliver a perfect product that is done and will never have to be changed again — good luck with that. Or, as part of your delivery, you deliver the source code. What source code? Well, not the generated code, because it’s essentially worthless as it’s not intended for human consumption (and in the case of WebDSL, believe me, it’s not). So you deliver the model that was the input of your code generator (e.g. the WebDSL source code). Fantastic. However, your customer is worried. In the future they may need developers familiar with the input language of your code generator to continue work on the product. Where are they going to find such developers? Well, in your company. That’s great for you, but not great for your customer, because you essentially locked them in.

This is a problem that is not specific to programs written in domain-specific languages like WebDSL, it’s true for other languages and even frameworks too. Yahoo rewrote its web store application, after buying it, from Lisp to C++ and Perl, because Yahoo engineers were not familiar enough with Lisp. Java web applications written using obscure Java frameworks have a similar problems, as did Ruby on Rails when it just got started.

If you produce software for a customer using languages and frameworks that very few other developers “speak”, you’re locking your customer in.

So what’s the solution?

Last year, Jos Warmer of mod4j gave a talk as part of our model-driven software development course. Mod4J is a set of DSLs for developing administrative enterprise applications in Java. Interesting twist is that they aim to generate code in the same style that it would have been written by hand by a developer. Consequently, when you deliver to your customer they do not need mod4j developers to continue development. They may not even care that you used mod4j to develop the product and simply continue maintaining the generated code. Of course this is not the ideal case, but it’s a fallback option that make customers feel safer. I have no experience with Mod4J and do not know if it really works that way, but I like idea. But is it always feasible to do this?

I wanted to try to take this approach for mobl, my DSL for mobile web applications, but it did not work out well. It turns out I missed an essential requirement for this approach to work.

My plan was to first develop a set of frameworks that the generated code would use. As this is a fairly new domain, hardly any of these frameworks exist. The first library I developed was persistence.js, which is an ORM library for client-side SQLite databases in Javascript. A second framework, which I named mobiworks, provides a set of jQuery plug-ins that provide HTML-encodings of mobl concepts, such as screens and templates.

But then I took a step back. To make this approach successful, what’s the plan of action?

  1. Develop a framework that developers would find useful and usable even without mobl.
  2. Promote that framework by itself, building a community around it, making sure that this was the way to build mobile web applications. Solving the “nobody knows this framework” problem.
  3. Build a nice DSL wrapper around the framework.
  4. Promote the DSL, build a community around it.

Yeah, you’ll agree that this was a rather pointless mission to begin with. Why not simply build a community around mobl immediately, making that the best way to build mobile web applications? Then I could drop the whole framework idea altogether.

What I did not realize earlier is that to make the Mod4J “generate human code” approach work there already has to be an established style that developers write their code in. There needs to be an established framework you can target. If there isn’t one, the approach is pointless.

So, I shifted gears again, rewriting stuff to generate efficient computer-readable Javascript code — Javascript as the assembly code of the mobile web. And it’s much simpler that way. The lock-in problem remains though, we’ll see if that is actually going to be a problem. Incidentally, if you’re a persistence.js user or consider using it: don’t worry, I do still use it for mobl and will keep working on it.

Got something to say?
  1. Zef,

    when you change the code generator, you might indeed break the API of the generated code. But you also might break the API of the DSL as you most likely have changed its semantics. I think in both cases you'll have to be careful if you don't want to break existing clients.
    In Java land the tooling still is a bit superior to DSL tooling so fixing broken code is usually better supported in Java compared to other languages (but we are all working on it, don't we ;-)).

  2. Zef Hemel says:

    Well changing the semantics is one thing, but we have changed the generated implementation of WebDSL a few times without changing the semantics. When we switched from the Seam framework to plain Java servlets for instance.

    In the short time I've been working I also changed the generated code style for mobl considerably twice, both times without changing the APIs or semantics, at least not intentionally. This is something you can do without having to worry about custom code. It's still a difficult undertaking, indeed, because you have to preserve semantics, but it's definitely easier because we don't have to take custom code into account.

Comments are closed now.