Author Archives


8
Mar 10

Pubsubhub-frickin’-bub

pubsubhubbub — I’m not kidding — is an actual existing thing. Yeah, I too would have killed to be a fly on the wall when they decided to call it that, but there you go. So let’s dissect this piece of art of a name:

  • pubsub, i.e. pub/sub, i.e. publish/subscribe, which is a common pattern in, well, the world, where multiple parties subscribe to one or more publishers, to receive notifications on updates.
  • hub, is a center of communication, often used for efficiency purposes, like an ethernet hub.
  • bub, you tell me. I haven’t a clue.

So what is it, really? It’s a protocol to enable near real-time update notifications among pubsubhubbub-supporting parties. Whereas RSS and Atom feeds required polling to receive updates, pubsubhubbub pushes information to parties. Currently various Google properties are have pubsubhubbub-support, such as Google Reader and Google Buzz. So, when you install a pubsubhubbub plug-in for a wordpress blog, like I have and I push the publish button on a post, it will almost instantaneously appear in all of your Google Readers and your Google Buzz, whereas before, it may have taken a few minutes or hours to appear.

Want to learn more about the details? Check out these slides, or the website:


4
Mar 10

10 Things You Hadn’t Expected HTML/Javascript Would Do

About 14-15 years ago my uncle took me to the university at which he was studying at the time. He had something to show me. He sat me behind a computer in the computer room and started a program called “Netscape”. He typed in an internet address ending with .au. I saw my first website and it came all the way from the other side of the world. It looked like crap, loaded incredibly slow, but it was cool.

I could have never have guessed that HTTP, HTML, CSS and Javascript would once not become the main way to access information, but also replace a lot of desktop applications. The abilities of the new HTML5 and other web technologies like SVG never cease to amaze me.

Here are 10 things I had not expected these open web technologies would be able to, but can in 2010:

  1. Interactively render physics of a cloth
  2. Live motion tracking
  3. Play YouTube videos without Flash
  4. Collaboratively edit source code in a browser IDE
  5. Do weird interactive stuff like this
  6. Animate simple 3D landscapes
  7. Read books in a mobile browser, while disconnected from the internet
  8. Play Wolfenstein 3D
  9. Play MarioKart
  10. Render flash files using Javascript/SVG

2
Mar 10

Javascript: OOP Style Performance

I have been watching parts of Douglas Crockford’s talks on the history and future of Javascript. In his third talk Douglas talks about functions. If you are somewhat familiar with Javascript you know that functions are, somewhat oddly, used to create new objects. The original “intended” way (which Douglas calls the pseudo-classical way) of doing this is as follows:

function Person(name, age) {
  this.name = name;
  this.age = age;
}

Person.prototype.getNameAndAge = function() {
  return "Name: " + this.name + " and age: " + this.age;
};

var p = new Person("Zef Hemel", 26);
console.log(p.getNameAndAge());

What we do here is define a constructor function Person, which starts with a capital P, to indicate that it should be used in conjunction with the new operator, as demonstrated in the second last line. The second statement adds a method to the Person prototype, which in practice makes that method available to all Person objects (including ones already created). The last two statements instantiate a Person object and calls the getNameAndAge method.

Using this mechanism you can also implement inheritance. Let’s introduce a LivingBeing “super class”, which has an age property:

function LivingBeing(age) {
  this.age = age;
}

LivingBeing.prototype.getAge = function() {
  return this.age;
};

Alright, now we’ll define the Person contructor again, but assign a new instance of LivingBeing to its prototype, which will add all fields and methods available in LivingBeing to all Person objects (again, including existing ones). Then, we add an additional method to Person objects: getNameAndAge:

function Person(name, age) {
  this.name = name;
  this.age = age;
}

Person.prototype = new LivingBeing();

Person.prototype.getNameAndAge = function() {
  return "Name: " + this.name + " and age: "
         + this.getAge();
};

var p = new Person("Zef Hemel", 26);
console.log(p.getNameAndAge());

For more information on how inheritance and prototypes work, read this excellent page in the Mozilla documentation.

Now, Crockford suggests that there is a nicer, cleaner and more natural way to do object-oriented programming and inheritance in Javascript, which he calls functional inheritance. The functional-inheritance style version of the above program looks as follows:

function livingBeing(age) {
  return {
    getAge: function() {
      return age;
    }
  };
}

function person(name, age) {
  var that = livingBeing(age);
  that.getNameAndAge = function() {
    return "Name: " + name + " and age: " + that.getAge();
  };
  return that;
}

var p = person("Zef Hemel", 26);
console.log(p.getNameAndAge());

Note that the invocation style of the constructors changed here too, no new keyword should be used. I agree that this is a nice style, although it makes extending existing objects with additional methods/fields hard, but one could argue this is a bad idea anyway. Although clean, it also seems more expensive to execute, because you’re basically composing an object from scratch every time — you start out with an empty object and that push in all its methods and fields — while with the pseudo-classical approach you create one prototype object with all the methods, and then simply point to that object. Theoretically calling a method would then be more expensive, because at invocation-time the prototype hierarchy has to be traversed. But maybe Javascript engines have a clever solution to all of this and in practice it doesn’t matter. I decided to investigate.

I benchmarked two things:

  1. The performance of object creation, by creating 10x 1,000,000 objects
  2. The performance of method calling on a single object, by invoking a method 10x 1,000,000 times

I did each 10 times, so that I can take an average time on each. I executed this benchmark on four browsers running on my Mac (Macbook Pro):

  • Firefox 3.5
  • Chrome 5 (dev)
  • Safari 4
  • Opera 10.50b

Disclaimer 1: The reason I executed these on these four browsers is not to compare their performance, this is not a good way to compare browser performance, but mainly to check that the results in different browsers do not diverge too much.

Disclaimer 2: What I’m testing is not representative for real programs. If one approach is going to be twice as fast as the other, this does not imply that your programs are going to be twice as fast, it means that object creation is twice as fast, or method invocation is twice as fast. Whether that matters to you depends on the amount of objects you create or methods you call.

Benchmark 1: Object creation

This benchmark creates 1,000,000 objects. First using the pseudo-classical style and then through the functional style. The code to this benchmark can be found here. The times reported are in milliseconds for 1,000,000 objects being created. As mentioned, every test is performed 10 times of which I took the average:

Update: The previous version of the benchmark script contained a major flaw that increased the execution time of the functional style considerably. This has now been adjusted in the graph. Thanks Adrian for noticing this.

This chart clearly shows that creating objects using the pseudo-classical style is cheaper in all browsers. This varies from about 35% cheaper (Chrome) up around three times as cheap (Firefox). If you create huge amounts of objects in your code, you may want to take this into consideration.

Benchmark 2: Method invocation

This benchmark creates one object and then invokes the same method on it 1,000,000 times. First using the pseudo-classical style and then through the functional style. The code to this benchmark can be found here. The results are as follows:

I would have expected that method invocation would have been cheaper in the functional style of object creation, but it turns out it’s not. The differences here are almost negligible, so I’d say that in practice it doesn’t really matter what style you choose if your application is heavy on method calls.

Still, I felt that the functional style must be cheaper especially if you use inheritance. So I adapted my benchmark script to introduce a level of inheritance (like the LivingBeing “super class”). However, method calls using the pseudo-classical style are still cheaper, although less so. Perhaps, if you use 4+ levels in your object hierarchy, the functional style method calls may start to become cheaper (although I expect creating those objects will be much more expensive).

It seems that method invocation on objects created using the functional style are roughly as expensive as using the pseudo-classical style.

Conclusion

There are two properties to take into account when deciding what object-creation style you’re going to use in your code.

  1. Which style produces clearer code? This depends on the taste of both on the developers of the code, but if you’re developing a library for others, possibly also on the expectations of your audience. Scanning through some Javascript libraries it seems the pseudo-classical approach is much more popular. Consequently, your audience is likely going to expect and be more comfortable with this approach. That’s something to take into account.
  2. Is the code CPU/memory intensive, does it create lots of objects? If so, the pseudo-classical approach is superior.

27
Feb 10

Markdown in Wordpress

Alright, after using various WYSIWYG editors in WordPress (which I use to to write this blog), I got fed up and decided to switch to Markdown. There are a couple of plug-ins to make this happen, in fact I’m using three, together they make using Markdown pretty nice in wordpress.

  • “Markdown for WordPress and bbPress”, this plug-in (just search for it through the wordpress plugin manager) contains the PHP Markdown parser, which supports the whole of Markdown, plus a few extras such as footnotes1. It parses every post as Markdown, including the old ones, which wasn’t a problem for me because the HTML of previous posts is also Markdown and seems to look OK.
  • “WP MarkItUp”, this plug-in contains a reasonable editor for editing Markdown, it doesn’t do much, but has some buttons for common Markdown markup (hahaha), such as headings, bold, italic and so on. The plugin is based on MarkItUp.
  • “highlight.js”, is a pure-javascript syntax highlighter that automatically detects the language of code blocks, which is exactly what you need in Markdown, because there’s no way to mark a code block with a language name. You cannot install this plugin through the plugin manager, as far as I’m aware, so you have to download it and install it yourself (see the README of the download). Example of syntax highlighting (of Javascript):
for(var i = 0; i < ar.length; i++) {
  console.log(i);
}

Now let’s see how this works in practice.


  1. Which look like this, in case you were wondering. 


26
Feb 10

Waddup Buzz?

Google seems to be working hard at Buzz. Increasingly it seems to group buzz updates together, like so:

On twitter we long had this problem with the same links and retweets showing up in our streams every so often. It would have been nice if those had automatically been grouped together like in the Buzz timeline, but they weren’t. Twitter fixed this problem by implementing native retweet functionality, which sort of solves the issue. So, the question is: why did Buzz group these particular updates together? To Let’s step into the algorithmic brain of the Google and try to figure this out. Let’s ungroup the updates:

Those with eye to detail will notice the number of updates here. How many were there when they were grouped? 1, 2, 3, 4, 5. Right? And when expanded? 1, 2, 3, 4… huh?

Anyway.

Let’s see if we can discover the reason behind the grouping. Is it content-based aggregation? Probably not, unless Google aggregates programming related content here, which I doubt. Did it summarize the four updates into one? Not at all, the javascript framework update has absolutely nothing to do with the other updates. So what is it then, that made Google feel the other updates were somehow less important than the Javascript one? And why would this be useful to me? The only answer I can come up with is that they were all posted by me around the same time. Except they’re really not posted all at 10:47am, they were posted in the course of about an hour, but Google only polls twitter every few hours or so (it seems). Is time-based aggregation really a useful thing, though?

Google, I think, or at least hope, you can do better than this.

 


24
Feb 10

An Intro to Distributed Version Control

There are a multitude of reasons why distributed version control systems (like Mercurial and Git) are potentially preferable to centralized systems such as CVS and Subversion. One is that branching is cheaper and merging works much better. I only have anecdotal evidence of this. Frankly, I use Git only for personal projects (in the sense that I’m the only one working on them). For WebDSL (which I work on with a couple of other people) we use subversion and there we hardly ever branch, because "merging works so badly". Which, I suppose is true, but I’m hardly a subversion expert.

Quite out of the blue, Joel Spolsky (of Joel on Software and StackOverflow fame) has published a Mercurial tutorial online. And even if you don’t give a rat’s ass about Mercurial, I suggest you do read at least the first "chapter": Subversion re-education, which points out the differences between the subversion and mercurial/git mindset:

 

Want to know something funny? Almost every Subversion team I’ve spoken to has told me some variation on the very same story. This story is so common I should just name it “Subversion Story #1.” The story is this: at some point, they tried to branch their code, usually so that the shipping version which they gave their customers can be branched off separately from the version that the developers are playing with. And every team has told me that when they tried this, it worked fine, until they had to merge, and then it was a nightmare. What should have been a five minute process ended up with six programmers around a single computer working for two weeks trying to manually reapply every single bug fix from the stable build back into the development build.

And almost every Subversion team told me that they vowed “never again,” and they swore off branches. And now what they do is this: each new feature is in a big #ifdef block. So they can work in one single trunk, while customers never see the new code until it’s debugged, and frankly, that’s ridiculous.

Keeping stable and dev code separate is precisely what source code control is supposed to let you do.

 


24
Feb 10

Javascript: A Language in Search of a Standard Library and Module System

  1. Array
  2. Boolean
  3. Date
  4. Error
  5. EvalError
  6. Function
  7. Math
  8. Number
  9. Object
  10. RangeError
  11. ReferenceError
  12. RegExp
  13. String
  14. SyntaxError
  15. TypeError
  16. URIError

Recognize that? Yes indeed, it’s the complete list of standard Javascript objects of Javascript 1.5. 16 objects, of which 7 are error objects. Of course, this is just the Javascript language by itself. In practice Javascript executes inside a browser, which gives it access to additional objects like the DOM, and all kinds of fancy HTML5 and non-standard browser-specific features.

When Javascript was just a little language that people would use to do simple mouse-over events, this was not a problem. However, now, more and more applications are written in Javascript in the browser and Javascript is used in more and more places other than the browser. Better yet I dare to bet you that in a few years time Javascript is going to be one of the most used programming languages in a growing number of domains. The growing body of Javascript code also means Javascript needs more than just those 16 objects. It needs a larger standard API and it needs a standardized way to modularize code.

At the very least it needs a de-facto standard way of doing object-oriented programming, having the choice between classical, prototypal, or, sure, why not lazy inheritance is not doing it for me. Different libraries use different styles. Worse yet, different libraries develop their own utility objects to produce classes, and if they’d at least agree on a common way of doing that, but no, there are a few dozen different way of implementing exactly the kind of inheritance you like. I’m not the biggest fan of Java, but at least they made a decision on this at the language level.

But I’m getting side tracked. The main two things Javascript needs to be a proper language that we can apply "grown-up" software engineering principles to:

  1. A standard library
  2. A module system

First, the standard library. Those 16 objects are no longer sufficient. So, what happens is that programmers step in to build a larger, more extensive library of objects. It would be great if they all teamed up to come with the one framework that everybody uses, but of course they didn’t, so we now have Prototype, MooTools, jQuery, MochiKit, ExtJS, YUI, Google Closure, and I’m probably forgetting about a dozen. This divides the Javascript world into different camps. Either you’re a jQuery guy or a Prototype guy, a ExtJS guy or a MochKit guy. There’s a YUI-y way of doing things and a Prototype way of doing things. There’s a jQuery calendar, an ExtJS calendar, and a Prototype calendar. Of course, once you picked a framework you stick to it, because if I want to use a prototype calendar from my jQuery application, I have to pull in x kilobytes of additional code that essentially duplicates jQuery functionality. A lot of effort is wasted because of this.

What Javascript’s standard library should contain is pretty tough thing to determine. First thing that comes to mind is a set of standard data structures. Of course Javascript has arrays and objects. Objects are typically also used as maps (although the keys can only be strings, and not even all strings) and sometimes even as set (where the property names represent values).

A better library of data structures is required, including a proper map and set implementation. Additionally, APIs for other common tasks such as:

  • JSON parsing and serialization
  • Testing
  • Cryptography
  • Date handling
  • DOM querying (I think most libraries agree that CSS selectors are a good way, right?)

Some stuff like DOM traversals, querying and widgets will be hard to agree on probably, but would be nice to have.

Because Javascript is increasingly being used as a non-browser language, for instance on servers, it will also need non-browser stuff like IO, inter-process communication, sockets etc. The CommonJS initiative is working on these. CommonJS is an initiative, mostly among Javascript server vendors, to agree upon certain standard interfaces to, e.g. IO, threads, sockets etc. Its main contribution to date, however, is its module system.

Javascript does not have namespaces, but you can use objects for this purpose. That’s fine. In the past, some frameworks have built their own module system around this. Dojo and Google Closure offer a remarkably similar API to export and load modules:

dojo.require('dijit.widget.Editor');
dojo.require('myproj.Something');
dojo.provide('myproj.MyObj');

myproj.MyObj = function() { };
myproj.MyObj.prototype.initialize = function() { ... };

Replace ‘dojo’ by ‘goog’ and you basically have the Google Closure version. Quite nice and reasonably clean. However, there’s also JSAN’s module system. CommonJS’s module system is really nice, however according to some reports cannot be implemented in the browser properly. There’s an asynchronous version of the CommonJS module system called RequireJS, but, well, it’s not the CommonJS standard. There’s a proposal to standardize the RequireJS system as part of CommonJS, but it’s not entirely clear what the status is of that proposal.

I feel these issues have to be resolved and the good thing is that it does not require any changes in the Javascript language itself, nor its browser support. If the different framework vendors would just agree on a single base library that they all use, because, let’s face it, everybody needs a function to trim strings and a proper set implementation, a clean way of doing inheritance, plus a module system to go with that, be it dojo/Google’s system or a CommonJS variant, I don’t care.

It would be oh-so-nice to have a de-facto standard library for this stuff.


16
Feb 10

Javascript: The Scope Pitfall

For the past few weeks I’ve been programming almost exclusively in Javascript. And to be quite honest it’s not as bad as you may think. In fact, Javascript is quite a nice language, as long as you are aware of its quirks. One quirk that bit me in the ass a few times already is its lack of block scopes. To make the problem clear, let’s look at a code fragment:

var elements = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
for(var i = 0; i < elements.length; i++) {
  var el = elements[i];
  console.log(el);
}

What does this code fragment print? Of course, it prints each of the elements in the array, so first 0, 1, 2, 3 etc. Easy enough. Now let’s adapt that a little bit:

var elements = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
var fns = [];

for(var i = 0; i < elements.length; i++) {
  var el = elements[i];
  fns.push(function() {
      console.log(el);
    });
}
fns[2]();

This code fragment uses a powerful feature of Javascript: first order functions. Functions are values that can be put in a variable, including arrays as this code demonstrates. In this fragment, the for-loop builds up an array called fns with functions that print an element of the list. Later on we can then pick one of the functions from that array and invoke it. The last line demonstrates this, it executes the 3rd function in the array. So what will that print?

If you’re a C/Java/C# programmer you’ll probably be guessing that it would print 2. In every iteration a variable el is declared that contains the current element from the array. The function that is generated on the fly contains a reference to el and will therefore be part of its closure. Logically, this means that in the first function captures the value 0 for el, the second function value 1 etc. Sadly, this is not the case. Javascript only has function scopes, not block scopes (for constructs like the for-loop). In effect, what you’re actually executing here is the following:

var elements = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
var fns = [];
var el;
for(var i = 0; i < elements.length; i++) {
  el = elements[i];
  fns.push(function() {
      console.log(el);
    });
}
fns[2]();

Alright, fine. Why is that a problem? You have to realize that the closure of the generated functions contain pointers to variables, not a snapshot of the values at the point of definition, meaning that if variables change, these changes are visible from within the function. This is a very powerful feature, but also a potentially very confusing one. Since there is only one variable el, to which a new value is assigned every iteration, and each function points to this same variable, every function in the array will produce the same value: 9 (the last value assigned to el in the for-loop).

If you have problems getting your head around that, don’t feel bad, it took me a while too.

So, how can we fix this? Well, we can artificially introduce scopes by creating new functions and immediately invoking them: 

var elements = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
var fns = [];

for(var i = 0; i < elements.length; i++) {
  (function() {
    var el = elements[i];
    fns.push(function() {
        console.log(el);
      });
  }());
}

fns[2]();

So what’s new here is the (function() { bit and the }()); at the end. What this simple trick does is define an anonymous function and immediately invoke it. What’s different than before is that a new scope is used for each iteration of the for loop, containing a fresh variable el. The functions that are generated now each refer to a different el. Thus, the fn[2]() call will now produce 2, as you would expect.

15
Feb 10

On Buzz

I’m not sure what to think about Google Buzz. On one hand it’s yet another place to post stuff (links, pictures, thoughts etc.), on the other hand it is quite a bit richer than Twitter, which I currently use for this purpose. Sure, it doesn’t have what’s one of twitter’s strengths I think: the 140 character limit, but it does have other stuff like a photo upload, location, a commenting system, the ability to "like" stuff, which provides the system with valuable information that can be used to filter, or at least prioritize future buzzes (or whatever they’re called).

Incidentally, if you’re not already following me on Buzz, do so now!

What Buzz is definitely not good for right now is following popular people; people that get loads of comments on every buzz, meaning these buzzes jump to the top all the time. But I’m sure that will be fixed.

Buzz is likely to go a bit more mainstream than Twitter, which will mean that my wife, mom and dad are likely to start using at some point. Should they really be following me though. I’m likely to push out stuff that they really do not care about (like work related things). Currently this is why I use both facebook and twitter. Facebook is for personal things. Twitter for work/hobby. What should I use Buzz for, if anything? Time will tell.

Anyway, my point is: follow me on buzz.


10
Feb 10

Who needs native phone apps anyway?