Streamlined Asynchronous JavaScript, with Bruno Jouhier

Javascript is a language that has come a long way since Brendan Eich implemented it in 10 days in the mid-nineties. Until a few years ago it was mainly used to build drop-down menus on website, but since the advent of — what we now refer to as — AJAX, Javascript has started to be taken more seriously as a language. Since then, Javascript codebases have increased rapidly. With the advent of node.js, Javascript is also used on the server-side, because it turns out the be an excellent language to write efficient asynchronous code — or is it?

As it turns out, not everybody is satisfied with the verboseness and structure of Javascript code.

We talk to Bruno Jouhier, author of streamline.js, a Javascript preprocessor that attempts to streamline asynchronous code written using Javascript (and CoffeeScript as well), removing the need to write all those callbacks by hand.

To start, who are you and what is your background?

I’m a 50 year old programmer. I studied at Ecole Normale Supérieure in Paris where I obtained a doctorate in Physics. I also have an engineering degree from Ecole Nationale Supérieure des Télécommunications in Paris.

Before Sage, I worked for two startups. The first one was called Neuron Data. I participated to the development of the company’s first product, an Expert System Shell called Nexpert Object, in the late 80s. I was at the origin of two of the company’s later products: a cross platform UI toolkit called Open Interface and a business rules engine called Blaze Advisor. I left the company in 1999 and joined a small startup called Ubiquis where I developed an e-commerce application closely integrated with Sage’s accounting products. We sold the company to Sage in 2000.

My programming experience is mostly with C, Java and C# but I’ve been exposed to all sorts of O-O languages: Smalltalk, Objective-C, C++, Eiffel, etc. I was fluent in emacs-lisp at some point and I also wrote stuff in esoteric AI languages like OPS5. I worked on various systems: Unix and Windows of course but I also did a lot of development on VMS and some on Macintosh.

What does Sage do and what is your role there?

Sage develops business software for small and medium size businesses. We have many products because Sage has grown by acquiring local software vendors. About 18 months ago I took the lead on the architecture of our global ERP product, Sage ERP X3.

Why has Sage chosen to use Server-Side JavaScript, and is this a across-the-board decision or made on a project-by-project basis?

Sage is a very decentralized company and product architects have had a lot of autonomy in their technology choices in the past. Things are starting to change though as we are more and more faced with global challenges. Sage is not considering SSJS as a global platform at this time as it is too immature for the type of applications we are building. But, despite its conservative image, Sage is open to innovation and is supportive of the choice that I made to use SSJS for a project around Sage ERP X3. If this project is successful (and I think it will be), SSJS may be considered for other projects.

Why did you select Javascript for the server-side for this project? Code reuse between client and server? Performance?

I initially selected Javascript to be able to reuse code between client and server. In offline mobile scenarios, we need to sync a subset of the data to the client, and the idea of sharing business logic between client and server is attractive.

The benefit is not just in using the same code, it is also in having the same language, even for code that is not shared. This way we can share expertise, best practices, etc.

Another reason is that I discovered jQuery in the summer of 2009 and it completely changed my opinion about dynamic languages and Javascript. It made me rediscover the pleasure of programming, and it reminded me of the pleasure that I had writing emacs-lisp 20 years ago. And retrospectively this LISP code was actually quite solid and elegant.

And behind this, the main motivation was simplicity: Javascript is an incredibly simple language (emacs-lisp too BTW), JSON is simple, REST is simple, Google search is simple, Apple’s products are simple, etc. I had just accumulated too much frustration with C#, XML, SOAP+WS-*, Windows, etc. I thought that we needed a radical departure from all this complexity, in order to build great products. Javascript was an opportunity!

Performance is not what made me choose Javascript, it is what made me switch from servers powered by Rhino (ringo.js and narwhal) to node.js.

What is the problem you identified while using JavaScript? Is the problem specific to Sage’s use of JavaScript, a server-side JS programing specific problem, or is it a general (fundamental?) problem with JavaScript?

The problem comes from the fact that we are using node.js which enforces a strictly asynchronous programming model. Javascript is probably better suited than many other languages to deal with asynchronous programming and it copes rather well with asynchronous APIs in the browser where we have thin layers of logic. But on the server side, we have thicker layers. In the case of a business application we typically have lots of business rules sitting on top of data access layers and web services. I found it really hard to implement the business logic and the underlying infrastructure with the callback and event models that node.js proposes for asynchronous code. The problem is that callbacks and events somehow turn the code upside down: you cannot use standard flow control statements like if/else, switch or loops to express your logic; instead, you have to write non-trivial code that manipulates callbacks. Some people have developed helper libraries that ease the pain but even with these libraries, the code that you write contains a lot more “noise” that the code you would have written if the APIs had been synchronous.

The problem is not specific to our use of Javascript and node.js. It is probably more stringent in our case because we have a lot of business logic to develop but I think that it is a general problem that all node.js developers have to solve in one way of another. The proliferation of async helper libraries in the node.js ecosystem and the recurrence of this topic on the forum are signs that there is a real pain point here.

And the problem exists in the browser too, as we start to develop more complex applications and put thicker business logic in the browser to support offline scenarios.

In some sense, the problem is even a fundamental problem with the language itself: Javascript lacks a high level mechanism to deal with asynchronous code flows. The programmer is left with low level mechanisms such as callbacks and events. There is actually a hot debate in the Javascript community about the introduction of coroutines or continuations in the language. These mechanisms would solve the problem but there is a strong resistance to their introduction in the language because they enable a limited form of threading (fibers).

How does streamline.js solve this problem?

Streamline.js solves the problem with weaker mechanisms that don’t let threads creep in. Streamline.js solves the problem with a preprocessor that transforms the code. Basically, streamline lets you write your code as if the APIs were synchronous and gives you a special placeholder parameter (an underscore) that you pass everywhere a callback is expected.

The preprocessor uses the placeholder to spot all the places where callbacks need to be generated, and it somehow “writes the callbacks for you”. The strength of the preprocessor is that it knows how to deal with all the standard Javascript flow control statements (conditionals, loops, exception handling). So it rearranges the code and generates the callbacks so that your code behaves as you would expect it to behave. The generated code is not very different from the code you would have written if you had to write the callbacks by hand. So you can use it with a debugger.

You have the choice of running the preprocessor before executing your program, in which case your program won’t need to embed the streamline transformation engine, or you can embed the engine and let the node.js “require” infrastructure invoke the transformation as it loads your modules.

From a more theoretical standpoint, streamline performs a CPS (Continuation Passing Style) transform. I am not a specialist of CPS transforms but I think that two things differentiate this CPS transform from other CPS transforms. The first one is that the transformation is partial: only the code flows that contain asynchronous calls are transformed. The second one is that it is based on an algebraic application of patterns (the patterns being the patterns that I discovered progressively when I was still writing the callbacks by hand). These two characteristics explain why streamline.js somehow “writes the callbacks for you” instead of turning the entire code upside down like some other CPS-based tools do.

Another important point about streamline.js is its compatibility with native node.js APIs and the absence of an additional runtime. You do not need to write wrappers around node.js APIs, you can call them directly. And the same applies in the other direction: you can call functions written with streamline.js from regular Javascript code.

Also, streamline comes with some goodies to facilitate asynchronous programming. For example it provides “futures” to let you parallelize I/O operations.

This underscore argument, is that just there for technical reason (to detect which functions should be called asynchronously) or does it serve another purpose as well?

The first version that I published on GitHub did not have the underscore argument. Instead, I had used a different marker: an underscore at the end of the function name. I introduced the underscore argument shortly after, when I did the CoffeeScript adaptation. As CoffeeScript only generates anonymous functions, and as I did not want to get dragged into hacking compilers, I changed the syntax and introduced the underscore parameter/argument.

And this was a lucky move! First, this syntax translated a key property of asynchronous functions, i.e. the fact that asynchronism is “contagious”, into a simple scoping rule: if a function calls an asynchronous function, it becomes asynchronous itself (unless you don’t care about completion of the sub function). This translates into the fact that you may only pass the underscore argument from a scope where the underscore is defined (as a parameter of the current function). The only violation of this rule is for top level calls in a script and there is another gotcha: the underscore parameter must be in the current function, not one of its ancestors in the scope.

But the syntax also had some practical benefits: it allows streamline.js code to call node.js functions in which the callback is not the last parameter; it also makes it easy to design functions that have optional parameters, by putting the callback in first rather than last position. And, it also allows allowed me to introduce a special syntax (_wrapXxx()) for wrapper functions that adapt the callback for APIs that do not adhere to the standard node convention (for example callbacks that don’t take an error parameter).

And more recently, I found a really cute way to leverage this underscore parameter. I was investigating ways to initiate several asynchronous operations in parallel and being able to join them later, without introducing much extra syntax or heavy libraries. I knew about “promises” and “futures” and I had the idea that asynchronous functions could return a “future” when called without a callback. So if foo(arg1, _) is an asynchronous function, calling f = foo(arg1) would return (synchronously) a future, which could be used later as f(_) to retrieve the result. In CS jargon, this translates into: “futures are obtained by currying the callback away”. So, and this is were I was really lucky with the underscore parameter design, futures came almost for free: “if you omit the underscore argument you get a future”.

There are certainly alternatives to this syntax (special call operators for example) but I went with this design because I wanted the source to remain valid Javascript. I always use the “reformat entire file” command when I write code and I don’t want this feature to be broken because of new syntax (and I have no time to hack text editors any more). This also allowed me to implement the transformation around Narcissus without having to hack it. Same thing with CoffeeScript: I did not have to do anything in the CoffeeScript compiler.

About debugging, you mention that the generated code is roughly what you would have written by hand otherwise, which helps. Still, in order to debug your program, you have to understand what the streamline.js preprocessor does exactly, i.e. you still have to understand how to write code in the continuation-passing style by hand. In that sense it’s a leaky abstraction. Would it possible to debug at the level of a streamline.js program rather than at the level of the generated Javascript or would that require changes to the Javascript infrastructure currently in place (e.g. v8)?

Yes, it could be seen as a leaky abstraction, but actually, is this really worse than working with an asynchronous helper library? If you use a helper library and something goes wrong, you often end up stepping into the library, and then you have to understand the internals of the library. With streamline.js, you step through code in which you easily recognize the code that you have written yourself and the callback “decorations” that streamline has added around it. So, stepping through may actually be easier because you are not jumping between your code and a library, you have everything under your eyes in a single file.

Also, I would reformulate your statement a bit: to debug streamline code, you do not have to understand “what the streamline preprocessor does exactly”, you have to understand “the code that it generates”. This is much easier.

I’d be careful with the term “leaky abstraction” though. Joel gives a subtle definition but a lot of forum posters are abusing it to easily disqualify things that they dont’t like. There is an obvious association with memory leaks, leaky engines, etc. In that sense, I do not consider streamline.js to be leaky. It may have emerged from a somewhat pragmatic investigation but it is based on an algebraic application of patterns and I think that it would be possible to write a formal proof that this transformation does not distort semantics (some of the limitations that I mentioned in the wiki, like the fact that the order of subexpression evaluation is not always preserved, are not too difficult to lift but lifting them would increase the size of the generated code and I chose not to do it).

I haven’t investigated if debuggers could be modified to operate on streamline source rather than on the generated code. Will Conant has contributed a feature that maps the output lines to the source lines, which might be a first step. But there is probably hard work to get to a really transparent debugging experience.

It may actually be easier to rewrite the transform as an AST transform directly in V8. But I’m not familiar at all with V8’s internals so I have no idea of what this really entails. But the transformation is less than 1600 lines of Javascript so this may not be such a big endeavor after all.

How do streamline.js’s futures and “goodies” compare to the libraries to facilitate async programming already out there?

There aren’t that many goodies actually.

The first set is just an async version of the ECMAScript 5 array methods (forEach, map, filter, every, some, reduce, reduceRight). They are probably equivalent to what you can find in other libraries. The only special thing is that the callback is passed as first argument rather than last, which works better in this case because the functions have optional arguments.

Then, there is a special function that I really like and that I called “funnel”. It allows you to control the number of concurrent executions over one or several code blocks. It can be used to limit the level of parallelism and avoid exhaustion of system resources (for example you quickly run out of file descriptors if you blindly parallelize recursive traversal of directories). It can also be used to set up “critical sections” by setting the limit to 1. I’m also thinking of introducing a variant that would handle exclusion between one writer and multiple readers.

The “futures” feature compares to futures found in promise/future libraries (Krys Zip and Kris Kowal are experts here). But there is a big difference because streamline only provides futures and it does not provides them as classes with methods. Instead, every function that you write with streamline will return a future if you call it without providing a callback; and the future itself is just an asynchronous function that returns the result via a callback. So, there is no special API to learn.

I just introduced another set of goodies to wrap node.js stream objects. The idea behind these wrappers it to let the consumer of the stream “pull” the data by calling a “read” method instead of having the stream “push” the data by emiting events. There are libraries that ease the work with node’s streams but I haven’t seen any that takes the radical approach of “inverting the flow” completely and exposing “read” methods instead of events. I think that this API style blends really well with streamline.

One thing that streamline does not provide is libraries to chain asynchronous calls. There are a lot of them around but they simply don’t apply to streamline because the chaining problem is solved differently, by a CPS transform.

Overall, I try to avoid introducing large APIs. I try to keep the APIs minimalist and very thin. Also, I have an unfair advantage over people who implement other helper libraries because I write them for streamline source. So I don’t have to constantly fight against callbacks; the CPS transform moves them out of the way which removes a big thorn. For example it becomes very easy to design functions that chain with each other.

How is streamline.js received by the Javascript community?

There are a few fans who praise it and some who have contributed but I also got a lot of negative feedback. For example, the first response I got when I announced it on the node forum was something like “callbacks are just fine, we don’t need any extravagant framework like this one” and it was followed by a series of “+1” posts. On one hand, this was a bit of a surprise to me because I was expecting that people were actually having a hard time with callbacks and that they would be happy to have an alternative but this was not the case in general. On the other hand, I’m an old programmer and I’ve seen many ugly religious debates. Programmers will always be programmers!

I don’t know about the Javascript community at large. Today I’ve only posted about streamline on the node forum because I feel that it is the focal point for “asynchronous Javascript”.

I’m nevertheless a bit concerned about this negative feedback that I got from the node community. I’m not concerned because of streamline itself but rather because of node.js. I really think that node.js is a great technology: simple, based on simple innovative principles, and very fast, but I think that the entry ticket is way too high today. There is a lot of buzz around node.js today but I would not be too surprised to hear some discordant voices soon. What concerns me is that node.js has the ambition of being the next PHP or the next RoR but it lacks the “basic spirit” of the former and the “structuring nature” of the latter. If we want this to happen, we need to work on the “entry ticket” problem. So, to me, people who stick to the “programmers just have to learn how to program with callbacks to deserve node.js” mantra do a great disservice to node.js. Node.js cannot become a mainstream platform without lowering its “entry ticket”. Maybe streamline.js is not the answer but maybe there are some good ideas to pick from it.

How do you see the future of streamline.js, can you imagine it to be a built-in feature of Javascript at some point in the future?

What I know for sure is that we (Sage) will use it, unless something better comes along. And I hope that we will build great products with it. Beyond that I don’t know but I’d be thrilled to discuss with the Javascript gurus, even if it is just to exchange ideas.

There are a few things that I like a lot about streamline. One of them is the economy of means: a simple syntax trick (reserving the underscore parameter) solves a big problem and even enables unexpected features (futures). Another one is that it preserves the single-threaded nature of Javascript and it actually makes the async-points “explicit” (but very discrete, just an underscore). This is not the case with fibers, and actually this is not even the case with the classical callback approach (after all, a callback is just a callback, there are both “synchronous” and “asynchronous” callbacks, and there is really nothing in the syntax that differentiates between them).

Links