Three Routes to Spaghetti-Free Javascript

(If you are familiar with the problems of moving from synchronous to asynchronous programming, feel free to move to the next section.)

Update: A lot of people misunderstood the main issue: here is another shot at explaining it better.

Let’s build a script that determines the titles of a set of URLs. Let’s start simple, we create a function that takes a URL, and returns the title:

var reg = /<title>(.+)<\/title>/mi;

function getTitle(url) {
   var body = "<html><title>My title</title></html>";
   var match = reg.exec(body);
   return match[1];
}

In this first prototype we ignore the fetching of the webpage, for now and assume some dummy HTML — for testing purposes. We can now call the function:

getTitle("http://whatever.com");

and we’ll get back:

"My title"

So far, so good. Now let’s iterate over an array of URLs and get each of their titles:

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
for(var i = 0; i < urls.length; i++) {
   titles.push(getTitle(urls[i]));
}
console.log(titles);

And the array of resulting titles (all "My title") is printed.

Next, to put in the actual URL fetching part into getTitle, we need to make an AJAX call (let’s just ignore the single-source restriction here):

function getTitle(url) {
  var xmlHttp = new XMLHttpRequest();
  xmlHttp.open("GET", url, true);
  xmlHttp.send();
  xmlHttp.onreadystatechange = function() {
    if(xmlHttp.readyState==4 && xmlHttp.status==200) {
      // Now we get the body from the responseText:
      var body = xmlHttp.responseText;
      var match = reg.exec(body);
      return match[1];
    }
  };
}

We open an XMLHttpRequest and then attach an event listener on the onreadystatechange event. When the ready-state changes, we check if it’s now set to 4 (done), and if so, we take the body text, apply our regular expression and return the match.

Or do we?

Note that return statement. Where does it return to? Well, it belongs to the event handler function — not the getTitle function, so this doesn’t work. The XMLHttpRequest is performed asynchronously. The request is set up, an event handler is attached and then the getTitle function returns. Then, later at some point, the onreadystatechange event is triggered, and the regular expression applied.

So, how do we fix this? Well, we can change our function a little bit:

function getTitle(url, callback) {
  var xmlHttp = new XMLHttpRequest();
  xmlHttp.open("GET", uri, true);
  xmlHttp.send();
  xmlHttp.onreadystatechange = function() {
    if(xmlHttp.readyState==4 && xmlHttp.status==200) {
      var body = xmlHttp.responseText;
      var match = reg.exec(body);
      callback(match[1]);
    }
  };
}

Now, instead of returning the value, we pass the result to a callback function (the function’s new second argument). When we want to call the function, we have to do it as follows:

getTitle('http://bla.com', function(title) {
  console.log("Title: " + title);
});

That’s annoying, but fair enough.

I suppose we also have to adapt our loop now, too. getTitle no longer returns a useful value, so we have to pass a callback function to it. Hmm, how do we do this?

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
for(var i = 0; i < urls.length; i++) {
   getTitle(urls[i], function(title) {
     titles.push(title);
   });
}
console.log(titles);

That looks about right. Except, that when running this code the last console.log will be executed immediately showing an empty array — because the getTitle calls have not finished executing yet. Asynchronous code executes in a different order than the code may suggest.

Shame.

We now have to think — what do we prefer? Do we want to have all the URLs fetched simultaneously, or do they have to be fetched in sequence? Implementing it in sequence is more difficult, so let’s do it in parallel. What we’ll do is add a counter!

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
var numFinished = 0;
for(var i = 0; i < urls.length; i++) {
   getTitle(urls[i], function(title) {
     titles.push(title);
     numFinished++;
     if(numFinished === urls.length) {
       // All done!
       console.log(titles);
     }
   });
}

As we get back the getTitle results we increase the numFinished counter, and when that counter has reached the total number of URLs, we’re done — and print the array of titles.

Ugh. Let’s not even look at the code to fetch these URLs sequentially.

Centuries of civilization and decades of programming research — and we’re back to this style of spaghetti programming?

There must be ways around this. Indeed, there are — let’s look at three of them.

Route #1: streamline.js

Streamline.js is a simple compiler, implemented in Javascript that enables you to write your code in a synchronous style. The nice thing about Streamline.js is that it operates on regular Javascript and does not add any new keywords — you can keep using your favorite editor and other tools. The only thing streamline.js does, is give the _ identifier new meaning. Before I demonstrate it, let’s refactor our code slightly. We’ll create a generic fetchHTML function:

function fetchHTML(url, callback) {
  var xmlHttp = new XMLHttpRequest();
  xmlHttp.open("GET", url, true);
  xmlHttp.send();
  xmlHttp.onreadystatechange = function() {
    if(xmlHttp.readyState==4 && xmlHttp.status==200) {
      callback(xmlHttp.responseText);
    }
  };
}

Now, streamline.js allows us to write our getTitle function as follows:

function getTitle(url, _) {
   var body = fetchHTML(url, _);
   var match = reg.exec(body);
   return match[1];
}

You’ll notice the _ argument there, which represents the callback function. It tells streamline.js that this is an asynchronous function. The next thing you’ll notice is the call to fetchHTML, which, although being an asynchronous function, is called as if it’s a regular synchronous function. The difference? The last argument: _.

Internally, streamline.js transforms this code to something equivalent to this:

function getTitle(url, _) {
   fetchHTML(url, function(body) {
     var match = reg.exec(body);
     return _(match[1]);
   });
}

This transformation is called the continuation-passing style transformation. We can now keep our loop simple as well:

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
for(var i = 0; i < urls.length; i++) {
   titles.push(getTitle(urls[i], _));
}
console.log(titles);

Basically the same as our original version, the only difference: an additional _ argument to getTitle.

Not bad huh? Streamline.js also has some nice functions to enable a parallel version of this code.

Name: streamline.js, a Javascript preprocessor (integrates nicely with node.js too)
License: MIT

Route #2: mobl

My own project, mobl is a language to rapidly develop mobile web applications. Although it’s not Javascript, the syntax of it scripting language is similar. Since mobl is typed, it is easy for the compiler to infer whether a function is asynchronous or not, which leads to code that is slightly more clean than streamline.js:

function getTitle(url : String) : String {
   var body = fetchHTML(url);
   var match = reg.exec(body);
   return match.get(1);
}

and the loop:

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = Array<String>();
foreach(url in urls) {
   titles.push(getTitle(url));
}
log(titles);

Like streamline.js, a continuation-passing style is performed by the compiler to produce asynchronous Javascript code.

Mobl is aimed at the mobile web domain, it a whole new language to learn and doesn’t currently support concurrent execution of asynchronous calls. Nevertheless, unlike streamline.js there’s no special _ variables to pass around.

Name: mobl, new language, browser only.
License: MIT

Route #3: StratifiedJS

The most powerful option is StratifiedJS. It extends the Javascript language with various structured concurrency features using a few new language constructs such as waitfor, and, or and retract. To fully understand its expressive power, it’s a good idea to have a look at these excellent interactive OSCON slides.

Here’s the code for StratifiedJS:

function getTitle(url) {
   var body = fetchHTML(url);
   var match = reg.exec(body);
   return match[1];
}

and the loop:

var urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
var titles = [];
for(var i = 0; i < urls.length; i++) {
   titles.push(getTitle(urls[i]));
}
console.log(titles);

As you can see, this code is basically exactly how you’d want to write the code. Compared to our original version, the only thing that changed was adding the fetchHTML call — as it should be.

With some effort I was able to capture the Javascript code that is this code fragment is translated to. Here’s the code generated for the getTitle function:

function getTitle(url) {
    var body, match;
    return __oni_rt.exseq(arguments, this, 'whatever.js',
      [1, __oni_rt.Scall(3, function (_oniX) {
        return body = _oniX;
    }, __oni_rt.Nb(function (arguments) {
        return fetchHTML(url)
    }, 2)), __oni_rt.Scall(4, function (_oniX) {
        return match = _oniX;
    }, __oni_rt.Nb(function (arguments) {
        return reg.exec(body)
    }, 3)), __oni_rt.Nb(function (arguments) {
        return __oni_rt.CFE('r', match[1]);
    }, 5)])
}

and the loop:

var urls, titles, i;
__oni_rt.exseq(this.arguments, this, 'whatever.js',
 [0, __oni_rt.Nb(function (arguments) {
    urls = ["http://zef.me", "http://google.com",
            "http://yahoo.com"];
    titles = [];
}, 4), __oni_rt.Seq(0, __oni_rt.Nb(function (arguments) {
    i = 0;
}, 8), __oni_rt.Loop(0, __oni_rt.Nb(function (arguments) {
    return i < urls.length
}, 5), __oni_rt.Nb(function (arguments) {
    return i++
}, 5), __oni_rt.Fcall(1, 6, __oni_rt.Scall(6, function(l){
    return [l, 'push'];
}, __oni_rt.Nb(function (arguments) {
    return titles
}, 6)), __oni_rt.Nb(function (arguments) {
    return getTitle(urls[i])
}, 6)))), __oni_rt.Nb(function (arguments) {
    return console.log(titles)
}, 8)])

What worries me somewhat about this generated code is that it seems rather heavy on the number of functions that’s being generated. Basically every expression is turned into a function passed to another function in the StratifiedJS runtime. This seems rather expensive. I haven’t done any performance benchmarking on this — so maybe it’s not as bad as I think.

Of the three, StratifiedJS is definitely the most flexible and allows you to write the cleanest code. Drawback is that it extends the Javascript language (unlike streamline.js) which could break your current tool chain. In addition, produced code is likely to be slower than the other two solutions.

Name: StratifiedJS, extension of Javascript.
License: MIT (although source code is only available in a minified version at the moment)

Conclusion

So there you go. Three ways to write clean synchronous code and produce efficient asynchronous Javascript code. The fact is that picking any of these requires a compiler of some kind to be added to your tool chain (although StratifiedJS performs this compilation at run-time), which may or may not be a problem.

A drawback of code generation in any shape or form is debugging. If something goes wrong, the code you’ll be debugging is generated Javascript code. StratifiedJS attempts to include original line numbers when exceptions occur, which helps. A fork of streamline.js attempts to maintain the line numbers in generated code.

In the end it’s all a trade off, a different route would be to use a library like async.js that, while not “fixing the language”, gives you an API that enables you to at least write asynchronous code in a more readable manner.

Got something to say?
  1. Anonymous says:

    Nice article!

    A couple of notes on the generated StratifiedJS code:

    Firstly, as you mention in your conclusion, StratifiedJS by default includes debug information in the generated code. There is an ‘optimize’ compilation mode as well which omits this information – it is not ‘officially’ enabled yet (you can access it by passing {mode:’optimize’} into $eval), and it isn’t fully stripping the debug information yet, but at the moment it produces around 10% faster code across our test suite, which in most cases is negligible.

    Secondly, as StratifiedJS is untyped like normal JS, the StratifiedJS transformer&runtime have to make the assumption that (almost) every function might be asynchronous behind the scenes, which leads to some overhead if the function is actually synchronous. In most situations this overhead is perfectly acceptable, but StratifiedJS recently gained a “__js” keyword to optimize those places where performance is critical. “__js” is similar to C++’s “__asm” keyword, and allows you to annotate a statement or expression as being standard non-blocking JS code rather than SJS code. As an example, this loop:

    __js while (a) { copyPixels(++a, ++b) }
    

    gets translated to the pretty much the same JS intermediate code (see http://bit.ly/eimcAx ). Without the ‘__js’, we get something a little more complicated (see http://bit.ly/eWnj2o ).

    Thirdly, the generated code might not look very nice or readable, but unlike streamline.js, this is not something we try to optimize for. We view the generated code just as an artifact of StratifiedJS’s just-in-time compilation process. It is not intended to be a faithful substitute for the original SJS source code, and not intended to be directly manipulated or debugged. Ultimately we want to get to a situation where you don’t even realize that you are writing something that goes through an intermediate SJS->JS compilation step. In fact, for node.js we’re looking into the possibility of integrating SJS directly into V8, in which case there wouldn’t be any intermediate code.

    Cheers, Alex

  2. Zef Hemel says:

    Thanks the bacground info!

    Just wanted to note that the code I showed for streamline.js was a cleaned-up version of what is actually generated. Personally I don’t mind if generated code is clean or not, I was more worried about the performance impact of generating the kind of code that StratifiedJS generates.

  3. You can’t correctly parse HTML with regular expressions! Your regex fails if you merely add any sort of parameter to the title element, such as “class” or “ref”. Use the browser’s DOM support: it’s easier and actually works. See here for more details: http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not#590789

  4. Robin Card says:

    This article wasn’t actually about parsing HTML by the way; it was just an example.

  5. Hova__0 says:

    Your premise is flawed, XMLHttpRequest does not have to be asynchronous

  6. Stijn Debrouwere says:

    You tuck away async.js at the end like it’s not a real option, but async.js combined with well-factored code for handling IO (e.g. for most CRUD apps something like a RESTService or Database class) hides away almost all complexities associated with async during day-to-day work. Not an option if you have all sorts of different asynchronous code going on in all different sorts of places, but for most web-apps that’s luckily not the case.

  7. Rudiger says:

    Talk about totally missing the point.

  8. Amir Shimoni says:

    There was a long discussion of adding “defer” semantics to coffeescript, but Jeremy decided against it.

    Something similar, named “backcall” was added to coco: https://github.com/satyr/coco/wiki/additions (search for backcall)

    We might get the ability to do something similar in Parsec Coffee-Script using “static metaprogramming”, but it’ll be a while before it’s stable. https://github.com/fab13n/parsec-coffee-script

    To read the discussion of why it wasn’t added to coffeescript, check out: https://github.com/jashkenas/coffee-script/issuesearch?state=closed&q=asynchronous#issue/350

  9. I finally get continuation passing style. All those haskell guys ranting and raving about it and all it took was a simple snippet of javascript with _ to make it all fit into place.

  10. Zef Hemel says:

    You recommend using the synchronous version of XMLHttpRequest? Plus, of course, it’s just an example. Replace with with a call to the geolocation API, database APIs etc.

  11. Kevin Dangoor says:

    The need for code to be either entirely sync or entirely async is a big drag indeed.

    JavaScript (as implemented in SpiderMonkey/Firefox) supports the yield keyword, which is for a Python-inspired feature called generators. Generators make it possible to pause the stack, rather than completely unwinding it as we do in async code now.

    There’s a chance, I believe, that generators will make it into ECMAScript Harmony. Not much help for us today, but some good hope for the future.

    Given current tooling (something I’m working on), the solutions that generate JS always bother me a bit because they make debugging harder, since the user is not always going to be clear on what code will be generated. That’s one of the reasons I appreciate CoffeeScript’s desire to stick closely to JS semantics.

    Libraries like async.js and Step are good for using straight JS but making your async code a bit clearer. It doesn’t keep the asynchronicity from “infecting” your other code though.

    Taken as a view of the problem and a look at three code generating solutions, I found this to be a great blog post. Thanks!

    Kevin

  12. Christian Sciberras says:

    By the way, the fetchHTML function, you’re passing the URL as variable named “url” but later on use “uri” instead :)

  13. Zef Hemel says:

    Fixed, thanks!

  14. Dave Herman says:

    Thanks for this article and the references! To follow up on Kevin’s comment, I’ve been experimenting lately with demonstrating different ways you can use SpiderMonkey’s generators to implement asynchronous code in a direct style. I’ve started a library called jstask [http://github.com/dherman/jstask], which gives you cooperative multi-threading in JS, without needing an offline compiler or preprocessor. Cooperative threading has the flexibility of Java threads without the programming nightmare of pre-emption (as always in JS, before you explicitly yield control back to the event queue, you know no one else can be running behind your back).

  15. Dave Herman says:

    Forgot to mention: I’ve been promoting generators on the ECMAScript committee, and I think they have a pretty decent chance of being accepted. Watch this space. ;)

  16. pav says:

    I’m new to javascript and all those callbacks are driving me nuts. Thanks for sharing this :)

  17. theburningmonk says:

    Have you tried the Reactive Extension for Javascript (http://msdn.microsoft.com/en-us/devlabs/ee794896)? I’ve been using the .Net version and it works very nicely and looking at this video (http://channel9.msdn.com/blogs/charles/introducing-rxjs-reactive-extensions-for-javascript) it works very similar to the .Net version and looks a very powerful library for doing async work in javascript

  18. Garza says:

    what about using jQuery’s new deferred objects (which is already baked into their ajax)?

  19. hmmm. thnx for letting me realize I write risky js..

  20. Martin says:

    it doesn’t but when you use XMLHttpRequest synchronous call it stops processing any javascript code a waits until XMLHttpRequest is finished. That’s usually useless.

  21. Tatumizer says:

    For  general solution to a problem of orchestration of async calls, see https://github.com/tatumizer/mesh

    This library is based on completely different idea. Check it out!

Trackbacks for this post

  1. Planning Ahead: the Async Javascript Problem « I am Zef
  2. The Morning Brew - Chris Alcock » The Morning Brew #805
  3. links for 2011-03-09 « pabloidz
  4. A geek with a hat » Discovered a cool javascript property

Comments are closed now.