Three Routes to Spaghetti-Free Javascript

(If you are familiar with the problems of moving from synchronous to asynchronous programming, feel free to move to the next section.)

Update: A lot of people misunderstood the main issue: here is another shot at explaining it better.

Let’s build a script that determines the titles of a set of URLs. Let’s start simple, we create a function that takes a URL, and returns the title:

var reg = /<title>(.+)<\/title>/mi;

function getTitle(url) {
   var body = "<html><title>My title</title></html>";
   var match = reg.exec(body);
   return match[1];
}

In this first prototype we ignore the fetching of the webpage, for now and assume some dummy HTML — for testing purposes. We can now call the function:

getTitle("http://whatever.com");

and we’ll get back:

"My title"

So far, so good. Now let’s iterate over an array of URLs and get each of their titles:

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
for(var i = 0; i < urls.length; i++) {
   titles.push(getTitle(urls[i]));
}
console.log(titles);

And the array of resulting titles (all "My title") is printed.

Next, to put in the actual URL fetching part into getTitle, we need to make an AJAX call (let’s just ignore the single-source restriction here):

function getTitle(url) {
  var xmlHttp = new XMLHttpRequest();
  xmlHttp.open("GET", url, true);
  xmlHttp.send();
  xmlHttp.onreadystatechange = function() {
    if(xmlHttp.readyState==4 && xmlHttp.status==200) {
      // Now we get the body from the responseText:
      var body = xmlHttp.responseText;
      var match = reg.exec(body);
      return match[1];
    }
  };
}

We open an XMLHttpRequest and then attach an event listener on the onreadystatechange event. When the ready-state changes, we check if it’s now set to 4 (done), and if so, we take the body text, apply our regular expression and return the match.

Or do we?

Note that return statement. Where does it return to? Well, it belongs to the event handler function — not the getTitle function, so this doesn’t work. The XMLHttpRequest is performed asynchronously. The request is set up, an event handler is attached and then the getTitle function returns. Then, later at some point, the onreadystatechange event is triggered, and the regular expression applied.

So, how do we fix this? Well, we can change our function a little bit:

function getTitle(url, callback) {
  var xmlHttp = new XMLHttpRequest();
  xmlHttp.open("GET", uri, true);
  xmlHttp.send();
  xmlHttp.onreadystatechange = function() {
    if(xmlHttp.readyState==4 && xmlHttp.status==200) {
      var body = xmlHttp.responseText;
      var match = reg.exec(body);
      callback(match[1]);
    }
  };
}

Now, instead of returning the value, we pass the result to a callback function (the function’s new second argument). When we want to call the function, we have to do it as follows:

getTitle('http://bla.com', function(title) {
  console.log("Title: " + title);
});

That’s annoying, but fair enough.

I suppose we also have to adapt our loop now, too. getTitle no longer returns a useful value, so we have to pass a callback function to it. Hmm, how do we do this?

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
for(var i = 0; i < urls.length; i++) {
   getTitle(urls[i], function(title) {
     titles.push(title);
   });
}
console.log(titles);

That looks about right. Except, that when running this code the last console.log will be executed immediately showing an empty array — because the getTitle calls have not finished executing yet. Asynchronous code executes in a different order than the code may suggest.

Shame.

We now have to think — what do we prefer? Do we want to have all the URLs fetched simultaneously, or do they have to be fetched in sequence? Implementing it in sequence is more difficult, so let’s do it in parallel. What we’ll do is add a counter!

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
var numFinished = 0;
for(var i = 0; i < urls.length; i++) {
   getTitle(urls[i], function(title) {
     titles.push(title);
     numFinished++;
     if(numFinished === urls.length) {
       // All done!
       console.log(titles);
     }
   });
}

As we get back the getTitle results we increase the numFinished counter, and when that counter has reached the total number of URLs, we’re done — and print the array of titles.

Ugh. Let’s not even look at the code to fetch these URLs sequentially.

Centuries of civilization and decades of programming research — and we’re back to this style of spaghetti programming?

There must be ways around this. Indeed, there are — let’s look at three of them.

Route #1: streamline.js

Streamline.js is a simple compiler, implemented in Javascript that enables you to write your code in a synchronous style. The nice thing about Streamline.js is that it operates on regular Javascript and does not add any new keywords — you can keep using your favorite editor and other tools. The only thing streamline.js does, is give the _ identifier new meaning. Before I demonstrate it, let’s refactor our code slightly. We’ll create a generic fetchHTML function:

function fetchHTML(url, callback) {
  var xmlHttp = new XMLHttpRequest();
  xmlHttp.open("GET", url, true);
  xmlHttp.send();
  xmlHttp.onreadystatechange = function() {
    if(xmlHttp.readyState==4 && xmlHttp.status==200) {
      callback(xmlHttp.responseText);
    }
  };
}

Now, streamline.js allows us to write our getTitle function as follows:

function getTitle(url, _) {
   var body = fetchHTML(url, _);
   var match = reg.exec(body);
   return match[1];
}

You’ll notice the _ argument there, which represents the callback function. It tells streamline.js that this is an asynchronous function. The next thing you’ll notice is the call to fetchHTML, which, although being an asynchronous function, is called as if it’s a regular synchronous function. The difference? The last argument: _.

Internally, streamline.js transforms this code to something equivalent to this:

function getTitle(url, _) {
   fetchHTML(url, function(body) {
     var match = reg.exec(body);
     return _(match[1]);
   });
}

This transformation is called the continuation-passing style transformation. We can now keep our loop simple as well:

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
for(var i = 0; i < urls.length; i++) {
   titles.push(getTitle(urls[i], _));
}
console.log(titles);

Basically the same as our original version, the only difference: an additional _ argument to getTitle.

Not bad huh? Streamline.js also has some nice functions to enable a parallel version of this code.

Name: streamline.js, a Javascript preprocessor (integrates nicely with node.js too)
License: MIT

Route #2: mobl

My own project, mobl is a language to rapidly develop mobile web applications. Although it’s not Javascript, the syntax of it scripting language is similar. Since mobl is typed, it is easy for the compiler to infer whether a function is asynchronous or not, which leads to code that is slightly more clean than streamline.js:

function getTitle(url : String) : String {
   var body = fetchHTML(url);
   var match = reg.exec(body);
   return match.get(1);
}

and the loop:

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = Array<String>();
foreach(url in urls) {
   titles.push(getTitle(url));
}
log(titles);

Like streamline.js, a continuation-passing style is performed by the compiler to produce asynchronous Javascript code.

Mobl is aimed at the mobile web domain, it a whole new language to learn and doesn’t currently support concurrent execution of asynchronous calls. Nevertheless, unlike streamline.js there’s no special _ variables to pass around.

Name: mobl, new language, browser only.
License: MIT

Route #3: StratifiedJS

The most powerful option is StratifiedJS. It extends the Javascript language with various structured concurrency features using a few new language constructs such as waitfor, and, or and retract. To fully understand its expressive power, it’s a good idea to have a look at these excellent interactive OSCON slides.

Here’s the code for StratifiedJS:

function getTitle(url) {
   var body = fetchHTML(url);
   var match = reg.exec(body);
   return match[1];
}

and the loop:

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
for(var i = 0; i < urls.length; i++) {
   titles.push(getTitle(urls[i]));
}
console.log(titles);

As you can see, this code is basically exactly how you’d want to write the code. Compared to our original version, the only thing that changed was adding the fetchHTML call — as it should be.

With some effort I was able to capture the Javascript code that is this code fragment is translated to. Here’s the code generated for the getTitle function:

function getTitle(url) {
    var body, match;
    return __oni_rt.exseq(arguments, this, 'whatever.js',
      [1, __oni_rt.Scall(3, function (_oniX) {
        return body = _oniX;
    }, __oni_rt.Nb(function (arguments) {
        return fetchHTML(url)
    }, 2)), __oni_rt.Scall(4, function (_oniX) {
        return match = _oniX;
    }, __oni_rt.Nb(function (arguments) {
        return reg.exec(body)
    }, 3)), __oni_rt.Nb(function (arguments) {
        return __oni_rt.CFE('r', match[1]);
    }, 5)])
}

and the loop:

var urls, titles, i;
__oni_rt.exseq(this.arguments, this, 'whatever.js',
 [0, __oni_rt.Nb(function (arguments) {
    urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
    titles = [];
}, 4), __oni_rt.Seq(0, __oni_rt.Nb(function (arguments) {
    i = 0;
}, 8), __oni_rt.Loop(0, __oni_rt.Nb(function (arguments) {
    return i < urls.length
}, 5), __oni_rt.Nb(function (arguments) {
    return i++
}, 5), __oni_rt.Fcall(1, 6, __oni_rt.Scall(6, function(l){
    return [l, 'push'];
}, __oni_rt.Nb(function (arguments) {
    return titles
}, 6)), __oni_rt.Nb(function (arguments) {
    return getTitle(urls[i])
}, 6)))), __oni_rt.Nb(function (arguments) {
    return console.log(titles)
}, 8)])

What worries me somewhat about this generated code is that it seems rather heavy on the number of functions that’s being generated. Basically every expression is turned into a function passed to another function in the StratifiedJS runtime. This seems rather expensive. I haven’t done any performance benchmarking on this — so maybe it’s not as bad as I think.

Of the three, StratifiedJS is definitely the most flexible and allows you to write the cleanest code. Drawback is that it extends the Javascript language (unlike streamline.js) which could break your current tool chain. In addition, produced code is likely to be slower than the other two solutions.

Name: StratifiedJS, extension of Javascript.
License: MIT (although source code is only available in a minified version at the moment)

Conclusion

So there you go. Three ways to write clean synchronous code and produce efficient asynchronous Javascript code. The fact is that picking any of these requires a compiler of some kind to be added to your tool chain (although StratifiedJS performs this compilation at run-time), which may or may not be a problem.

A drawback of code generation in any shape or form is debugging. If something goes wrong, the code you’ll be debugging is generated Javascript code. StratifiedJS attempts to include original line numbers when exceptions occur, which helps. A fork of streamline.js attempts to maintain the line numbers in generated code.

In the end it’s all a trade off, a different route would be to use a library like async.js that, while not “fixing the language”, gives you an API that enables you to at least write asynchronous code in a more readable manner.