On Language Design: Making Expensive Actions Hard

While abstraction is a great thing, some abstractions are completely unpredictable. For instance, object-relational mappers are very convenient to get started, but if performance is important these frameworks can get very unpredictable performance characteristics. Let's take Hibernate as an example. Java does not support properties, instead there's a convention of using getter and setter methods for this purpose. The general assumption is that calling these methods has little overhead:

Person p = em.find(Person.class, "somePersonId");
println(p.getName());
println(p.getFather().getName());

Easy enough. But hang on, what did that third line do? I turns out that it had to execute a SQL query to fetch p's fatherly Person object. Not a big deal, you'd say. But what if you have code like this in a for loop iterating over 200 person objects? At the face of it, this should be cheap, you're just doing simple property access, right? However, behind the scenes it executes another query for every iteration, making this simple loop quite expensive at 201 required queries. Sure, you can tell Hibernate to prefetch the father property, and you should, but if you forget you may not notice what's hogging the database until you start inspecting query logs.

In Java, it is difficult to predict which statements are going to be expensive to execute by inspecting the code.

This is true for practically every programming language, but it's also a shame. It may be useful to have some kind of syntactical overhead for performing expensive operations. If expensive operations are inconvenient to write, will that not encourage programmers to write more efficient code?

It turns out Javascript has a mechanism for this. Javascript environments are typically single threaded. If you only have a single thread available in an interactive environment like the browser you want to block it as little as possible. Therefore, in Javascript, expensive operations are performed asynchronously. I complained about asynchronous programming before, but it has the nice property of making explicit what operations are going to be expensive.

Let's see how this affected persistence.js, my asynchronous Javascript ORM

persistence.transaction(function(tx) {
  Person.load(tx, "somePersonId", function(p) {
    println(p.name);
    p.fetch(tx, 'father', function(father) {
      println(father.name);
    });
  });
});

That's one piece of annoying code to write, right? Indeed it is, because it's quite expensive to execute too. As you can see there are three callback functions, which is also the number of database operations that are required to execute this code. One to start a transaction, one to load the person object and one to load its father object.

So how does this work in loops? We can use the each method on query collections:

persistence.transaction(function(tx) {
  Person.all().each(tx, function(p) {
    println(p.name);
    println(p.father.name);
  });
});

Again two defined callback functions, which gives you a feeling of how expensive this code is. One callback for starting the transaction and another one that iterates over the query result. However, if we execute this code we will get exceptions:

Property 'father' with id: 0BEEC2CB6AF64A72A7647DF09BCD62C3 not fetched, either prefetch it or fetch it manually.

Rather than lazy loading the father property in each iteration, persistence.js simply throws an exception, it does not support lazy loading without syntactical overhead. However, as the error explains, we can fix the problem by prefetching the father property, so we will:

Person.all().prefetch("father").each(tx, function(p) {
  println(p.name);
  println(p.father.name);
});

Which results in an efficient predictable performance in terms of number of queries executed (1 query for this piece of code).

Kind of nice, right?