Callback-Free Harmonious Node.js

For a long time, JavaScript has been my go-to language. Of course, in the browser, but on the server too. When I started with JavaScript I spent a lot of time finding ways to reduce “callback hell”, but eventually I just gave in: code generators and systems that require a patched JavaScript engine are just not the way forward — they’re not “native” JavaScript solutions.

So, for years I wrote callbacks like everybod else until I they became part of my muscle memory. They’re manageable, but you have to do a little more work.

A few weeks ago I did some work with Python, after a long Python break, and: Wow — Python code is succinct. You can actually do stuff like this:

output = subprocess.checked_call(["ls", "-l"])

and output would contain the output of running ls -l. If it would fail, it would raise an exception. Insanity! I had gotten so used to the node.js way that it seemed it was the only way to do it:

subprocess.exec("ls -l", function(err, output) {
    if(err) {
        return callback(err);
    }
    ...
});

Believe it or not, in Python you can perform IO inside of a regular for-loop, something that is not generally recommended in JavaScript. Did I suffer from JavaScript callback Stockholm Syndrome all this time?

Then, after seeing some reference to the upcoming EcmaScript 6 (next version of JavaScript) generator feature, I found Tim’s post about generators vs fibers. Now, I don’t find fibers all that interesting, but the generators captured my interest, especially since you can enable support for them in node.js 0.11.x already using the --harmony flag. Then I found TJ Holowaychuk co module and I knew:

There is a callback-free node.js future in sight.

Perhaps not fully callback-free — and that’s perfectly fine — but with significantly fewer callback functions to write.

For dramatic effect, let’s start with some async JavaScript code “old style” and then rewrite it to use co and generators. The code does three things:

  1. Read a file to fetch a number of URLs
  2. Fetch the contents of the URLs
  3. Concatenate all contents together (for some unknown reason)

Here’s what my first take at it usually looks like:

function readUrlsFetchConcat(path, callback) {
    fs.readFile(path, "utf8", function(err, contents) {
        if (err) {
            return callback(err);
        }
        var urls = contents.trim().split("\n");
        var conc = "";
        async.forEach(urls, function(url, next) {
            request(url, function(err, resp, body) {
                if (err) {
                    return next(err);
                }
                conc += body;
                next();
            });
        }, function() {
            // Done!
            callback(null, conc);
        });
    });
}

Afterwards I sometimes refactor it a bit more into separate functions with quasi-descriptive names:

function readUrlsFetchConcat(path, callback) {
    var conc = "";

    function fetchUrlConcat(err, next) {
        request(url, function(err, resp, body) {
            if (err) {
                return next(err);
            }
            conc += body;
            next();
        });
    }

    function fetchUrlsConcat(err, contents) {
        if (err) {
            return callback(err);
        }
        var urls = contents.trim().split("\n");
        async.forEach(urls, fetchUrlConcat, function() {
            // Done!
            callback(null, conc);
        });
    }

    fs.readFile(path, "utf8", fetchUrlsConcat);
}

A cool thing is that you can make this code fetch the URLs in parallel by using a parallel implementation of async.forEach very easily, without having to do complicated threading stuff.

Alright, now let’s have a look how this code could be written using the co, co-fs and co-request modules — I’ll add the require lines to make this clear:

var fs = require('co-fs');
var request = require('co-request');

function* readUrlsFetchConcat(path) {
    var contents = yield fs.readFile(path, "utf8");
    var urls = contents.trim().split("\n");
    var conc = "";
    for(var i = 0; i < urls.length; i++) {
        conc += yield request(url);
    }
    return conc;
}

Pretty short and sweet right? And it will be executed “just as asynchronously” as the previous code. The magic is in the function* as opposed to just function and use of the new yield expression.

To execute this function, we have to run it inside of a co context:

co(function*() {
    console.log(yield readUrlsFetchConcat("urls.txt"));
})();

To make the URL fetching happen in parallel rather than sequential we can rewrite our function to:

function* readUrlsFetchConcat(path) {
    var contents = yield fs.readFile(path, "utf8");
    var urls = contents.trim().split("\n");
    var yieldables = [];
    for(var i = 0; i < urls.length; i++) {
        yieldables.push(request(url));
    }
    var allContents = yield yieldables;
    return allContents.join("");
}

The difference here is where the yielding happens: rather than yielding each request call individually, they’re now all saved up in an array and then the array is yielded in one go. This will execute all requests in parallel and return the responses as elements of a new array. In fact, you can write this slightly more concise if we’d want to:

function* readUrlsFetchConcat(path) {
    var contents = yield fs.readFile(path, "utf8");
    var urls = contents.trim().split("\n");
    var allContents = yield urls.map(request);
    return allContents.join("");
}

Why it works

I won’t get into the nitty gritty details of how co is implemented, although it’s not all that complicated, but I will try to give you some intuition for how this works.

The JavaScript feature that this is all based on is generator functions, a new feature in the upcoming EcmaScript 6. JavaScript generator functions are very similar to generator functions in Python. Here’s an example that shows how they operate:

function* interview() {
    var answer;
    console.log("Going to ask question #1");
    answer = yield "What's your name?";
    console.log("Got answer:", answer);
    console.log("Going to ask question #2");
    answer = yield "How old are you?";
    console.log("Age", answer);
    return "Thanks!";
}

var interviewer = interview();
console.log("Starting the interview!");
var result = interviewer.next();
console.log("Question 1:", result.value, "done?", result.done);
result = interviewer.next("Zef");
console.log("Question 2:", result.value, "done?", result.done);
var result = interviewer.next(30);
console.log("Final note:", result.value, "done?", result.done);

When run in a Harmony (the code name for ES6) enabled JavaScript engine (like when you run node 0.11.x with the --harmony flag), this outputs:

Starting the interview!
Going to ask question #1
Question 1: What's your name? done? false
Got answer: Zef
Going to ask question #2
Question 2: How old are you? done? false
Age 30
Final note: Thanks! done? true

Execution of the interview generator function suspends immediately after invocation. To start it you call the next() method on the object it returns. The function will then run as usual until it hits a yield statement. Once it hits yield it will suspend execution again and pass the value that’s yielded to the caller of next(), which then resumes execution potentially calling next() again with a return value for the yield, and so on and so forth until the function returns. Using generators you can implement a nice co-operative play between the generator and its consumer that enables some powerful things.

There are various use cases for generator functions. Some will bring up fibonacci as an appealing example, but where it gets interesting to me is using it in conjunction with asynchronous execution.

If, hypothetically, we could write a function that drives a generator that once in a while yields a function value that needs to be executed asynchronously, and be resumed with .next() when the result becomes available, we’d be almost there. As it turns out, this exactly what the co module does and a bit more.

But what value should you yield for co to be able to perform this IO for you? You can’t really use standard node.js async APIs because they don’t return a useful value. co supports two options:

  1. Promises
  2. Thunks

A “thunk” is basically a partially evaluated function with just the callback argument left over to be filled in. For instance, here’s how you could wrap node.js’ fs.readFile method to have a thunk API:

function readFile(path) {
    return function(callback) {
        fs.readFile(path, callback);
    };
}

In principle, this changes how you call this function only slightly. Instead of:

readFile(path, function(err, result) { ... });

You now call:

readFile(path)(function(err, result) { ... });

However, to co this makes a world of difference, because this:

var contents = yield readFile(path);

will now yield a function that it can execute asynchronously, and once it receives its result, it can call .next(result) to resume execution, or .throw(err) if an error occurred (resulting in an exception being thrown inside of the generator that you can catch with try-catch).

So, how do you thunkify node.js APIs? The co wiki lists a bunch of modules that are “co ready”, these include wrapper modules like co-express, co-request, co-fs, but also libraries that support both regular callbacks and thunks. In fact, what I’ve been doing since knowing about co, is make my APIs support both calling conventions:

function asyncFn(arg1, arg2, callback) {
    if(typeof callback !== "function") {
        return function(callback) {
            asyncFn(arg1, arg2, callback);
        };
    }
    // proceed as usual
}

or, if you’re using underscore or lodash like me:

function asyncFn(arg1, arg2, callback) {
    if(!_.isFunction(callback)) {
        return _.partial(asyncFn, arg1, arg2);
    }
    // proceed as usual
}

Now your async API can be used in the traditional callback style, or using co. There’s also the thunkify module to “thunkify” any existing traditional callback-style APIs.

While you can wait until EcmaScript 6 is done to use this, you don’t have to. Generators are available today in node >= 0.11.3 using the --harmony flag. It makes many types of node.js code so much nicer to write and interoperates fairly well with existing asynchronous code. Performance overhead of using generators is negligible. So try it out!

And if you’re authoring npm modules: consider supporting the thunk-based calling convention as well as classic callbacks, it isn’t that hard, and the co-users of the future will love you for it.