On Language Design: Making Expensive Actions Hard

While abstraction is a great thing, some abstractions are completely unpredictable. For instance, object-relational mappers are very convenient to get started, but if performance is important these frameworks can get very unpredictable performance characteristics. Let’s take Hibernate as an example. Java does not support properties, instead there’s a convention of using getter and setter methods for this purpose. The general assumption is that calling these methods has little overhead:

Person p = em.find(Person.class, "somePersonId");
println(p.getName());
println(p.getFather().getName());

Easy enough. But hang on, what did that third line do? I turns out that it had to execute a SQL query to fetch p’s fatherly Person object. Not a big deal, you’d say. But what if you have code like this in a for loop iterating over 200 person objects? At the face of it, this should be cheap, you’re just doing simple property access, right? However, behind the scenes it executes another query for every iteration, making this simple loop quite expensive at 201 required queries. Sure, you can tell Hibernate to prefetch the father property, and you should, but if you forget you may not notice what’s hogging the database until you start inspecting query logs.

In Java, it is difficult to predict which statements are going to be expensive to execute by inspecting the code.

This is true for practically every programming language, but it’s also a shame. It may be useful to have some kind of syntactical overhead for performing expensive operations. If expensive operations are inconvenient to write, will that not encourage programmers to write more efficient code?

It turns out Javascript has a mechanism for this. Javascript environments are typically single threaded. If you only have a single thread available in an interactive environment like the browser you want to block it as little as possible. Therefore, in Javascript, expensive operations are performed asynchronously. I complained about asynchronous programming before, but it has the nice property of making explicit what operations are going to be expensive.

Let’s see how this affected persistence.js, my asynchronous Javascript ORM

persistence.transaction(function(tx) {
  Person.load(tx, "somePersonId", function(p) {
    println(p.name);
    p.fetch(tx, 'father', function(father) {
      println(father.name);
    });
  });
});

That’s one piece of annoying code to write, right? Indeed it is, because it’s quite expensive to execute too. As you can see there are three callback functions, which is also the number of database operations that are required to execute this code. One to start a transaction, one to load the person object and one to load its father object.

So how does this work in loops? We can use the each method on query collections:

persistence.transaction(function(tx) {
  Person.all().each(tx, function(p) {
    println(p.name);
    println(p.father.name);
  });
});

Again two defined callback functions, which gives you a feeling of how expensive this code is. One callback for starting the transaction and another one that iterates over the query result. However, if we execute this code we will get exceptions:

Property ‘father’ with id: 0BEEC2CB6AF64A72A7647DF09BCD62C3 not fetched, either prefetch it or fetch it manually.

Rather than lazy loading the father property in each iteration, persistence.js simply throws an exception, it does not support lazy loading without syntactical overhead. However, as the error explains, we can fix the problem by prefetching the father property, so we will:

Person.all().prefetch("father").each(tx, function(p) {
  println(p.name);
  println(p.father.name);
});

Which results in an efficient predictable performance in terms of number of queries executed (1 query for this piece of code).

Kind of nice, right?

Tags: ,

  • CJM
    I have to admit, I don't buy it. You're no longer abstracting the calls and you'll have a hell of a time changing your code if you want to swap out the upstream implementation (what if your persistence provider is in memory?). All I want to do semantically is read a property (why should I create a transaction when no writing is involved anyways?).

    One of the great things about abstraction is that you have reduced duplication of code. Your solution fails at this and requires nearly identical callbacks and transactions and whatnot to be created every time you want to do something (if you continue down this path, it makes sense not to factor the transaction block into its own function, because that would be hiding its real cost to consumers). Another great thing about abstraction is that it makes it possible to improve and change the implementation (and correct defects in it) without needing to modify the consumers. Actually creating all of the objects manually in sequence is well and good (and efficient, sure), but it doesn't exactly lead to code that can be maintained easily.

    It seems to me that your solution is essentially 'make the programmer type more so that they get the idea that something is expensive'. The example that you provided is not necessarily expensive - you yourself mentioned that you could set up prefetching which would make the operation cheap. On different platforms, even without prefetching, the operation isn't necessarily that expensive. For that matter, what do you even mean by expensive (storage, processor cycles, end-to-end time, shared locks) - I'm guessing in terms of time? Even in the cases where it is always expensive, how is your solution better than creating functions with long names like 'getFatherOfObjectPleaseNoteThisTakesALittleWhileButNotQuiteAsLongAsFunctionX' to represent the functions that take longer? If you want approximate time cost information to be available, add it as metadata to the function if your language of choice supports it (eg [ApproximateFunctionTimeCostAttribute(ApproximatedCost.Expensive)]), or tell your developers to test their code (does it run acceptably fast? okay!), or tell them to use a profiler. Making complex or slow functions difficult to call isn't a way to program more efficiently, it's a way to irritate developers and design poor APIs.
  • HolyHaddock
    > In Java, it is difficult to predict which statements are going to be expensive to execute by inspecting the code.

    That's true, but the reason - abstraction - is valuable! Relating your example to The Day Job, I can perform that ORM call deep in a data layer somewhere, pass the Person object for display - and when writing the UI code, I (or another person, or another team) writing the code don't have to know that there is a data layer. In your example, doesn't the data layer either need to know when the UI layer will need the parent, or the UI layer have to know about the fetch function from the data layer?

    I don't really know Javascript, so the rest of this post is very interesting. I like that having taken that design decision, what would be incredibly verbose and impossible in Java is reasonably straightforward here. Can you use those objects in the same ways as you could the equivalent Java objects (ie adding behaviour to the object)? Is their "life" limited to the duration of the async anonymous function, or could they, for example, be dumped in a queue for later use?
  • > That's true, but the reason - abstraction - is valuable!

    Abstraction is very valuable, but this is an example of a leaky abstraction. You *think* you're just doing a cheap getter call, but in fact it is really expensive. But of course, it's a trade off.

    > Can you use those objects in the same ways as you could the equivalent Java objects (ie adding behaviour to the object)? Is their "life" limited to the duration of the async anonymous function, or could they, for example, be dumped in a queue for later use?

    Absolutely, they're just regular objects, like Java objects. You can do with them whatever you like. Because of Javascripts extreme dynamic nature you can add any behavior to the object you like.
  • I met someone from Sony Ericsson a while back and he told me about an issue they had with the address book in certain cell phone models. Scrolling through the list was very slow. Someone took a good look at it and it turns out, for every step you scrolled through the address book, about 60 queries were executed.
  • Right, that's exactly the kind of thing you want to avoid.
  • reddit
    TLDR; I don't have implemented lazy fetching, *but* it's to prevent stupidness, look mum I've done an ORM!
  • Thanks for that.
blog comments powered by Disqus