On Java Generics and Dynamic Typing

Again? Yes, again. I just found a series of posts on Bruce Eckel’s “weblog”:http://www.mindview.net/WebLog (which I assumed I was subscribed to, but apparantly wasn’t) about Java’s erasing implementation of generics.

Eckel’s posts:
* “Puzzling Through Erasure”:http://mindview.net/WebLog/log-0057
* “Puzzling Through Erasure II”:http://mindview.net/WebLog/log-0058
* “Puzzling Through Erasure III”:http://mindview.net/WebLog/log-0059
* “Puzzling Through Erasure IV”:http://mindview.net/WebLog/log-0060

A month or two ago I wrote about generics in Java (which I assumed would be called 1.5) in “Why Java Sucks”:http://www.zefhemel.com/archives/2004/08/16/why-java-sucks:

bq. The problem with this is that you can only understand why this stuff doesn’t work by understanding how they implemented those features, which is a typical example of a “leaky abstraction”:http://www.joelonsoftware.com/articles/LeakyAbstractions.html.

Note that what I meant with wiping is the same thing as what Bruce Eckels calls erasure. Bruce Eckel says the following about that:

bq. Erasure will not make sense to programmers first learning Java Generics unless they have the context in which it arose. Otherwise it will just seem like a strange and annoying limitation, odd and arbitrary behavior by the compiler that forces them to remember that information doesn’t exist when it seems like it ought to.

So we seem to agree. He also gives some examples of things you’d want to do with Java generics but can’t because of the magic of erasure:

class AClass<T> {
private final int SIZE = 100;
public static void aMethod(Object arg) {
if(arg instanceof T) {} // Error
T var = new T(); // Error
T[] array = new T[SIZE]; // Error
T[] array = (T)new Object[SIZE]; // Unchecked warning

I’m not going to explain why this is, because 1) I don’t want to know, it should just work and 2) Bruce Eckel already explains it “here”:http://mindview.net/WebLog/log-0058.

*And this hell started because…?*
I already talked “about this”:http://www.zefhemel.com/archives/2004/09/03/a-better-java, but I seriously wonder why generics in Java 5 have been implemented using erasure.

Bruce Eckel:

There is lots of published misinformation about this fact. For example, a fair number of people have firmly asserted that erasure allows Java 5.0 code to run under JDK 1.4. Neal Gafter has stated that this is incorect.

And misinformation goes the other way, too. You can find Sun Java designers confidently stating that C#/.NET 2.0 will break code from C# 1.0, because .NET 2.0 doesn’t use erasure. I checked this out with Anders Hejlsberg (the lead designer of C#) and he said:
I think everyone is pretty much in agreement that this is what backwards compatibility means. But C#/.NET 2.0 doesn’t use erasure – it preserves the type information at runtime, so you can know at runtime the type of T, and you can even know in the above example that you have a List&lt;Integer&gt;.

*But it gets worse*
Aparantly there’s only one guy (Neil Gafter) working at the Java compiler, and this same guy works on the Java libraries too…

bq. In a this response to my article Puzzling Through Erasure, Neal Gafter points out that he was lazy when rewriting the Java Libraries (he doesn’t have enough to do writing the compiler, he has to write the libraries too?). That we should not do what he did.

And what is it we shouldn’t do?

bq. So, first lesson: even though something appears in the Java library sources, that’s not necessarily the right way to do it. This is disappointing, since “Java from Sun” has usually been held up as the reference implementation. Now when I find something coded in the libraries, I’ll have to question whether this is the good way to do it, or if it was just expedience.

Well, that’s just great.

*Enough of that*
I agree. Let’s talk about something more fun: dynamic typing. Bruce Eckel is a big Python advocate. Two days ago “I talked about dynamically typed languages”:http://www.zefhemel.com/archives/2004/10/04/dynamically-typed-languages and the problems I see using them. As I think highly of Bruce, I’m still waiting for him to give the definate argument why dynamically typed languages are better than statically typed ones.

I am well aware of the value of static typing. Going from pre-ANSI C to C++ made me think that enough static type checking might be able to guarantee the proper execution of all programs.

But eventually I saw that static type checking is just one form of testing. And testing is what your program needs. It’s great if the compiler can perform those tests for you, and it can make things a lot easier when it does. But static testing is only one part of the picture, and has its limits. At some point you must also have dynamic testing, as you try to get your coverage more and more complete.

Ironically (in this discussion) dynamic testing is one of the things that Java brought to the table. C++ is actually the language that can be called (almost) purely statically-typed. The runtime model in C++ is that of C, which is to say, that of assembly, which is to say “effectively none.” C++ must do all of its checking at compile-time; it has no choice. The Java designers understood that some tests, array-bounds checking and incorrect casts, to name two, must occur at runtime. So Java has sophisticated runtime support, and this allows powerful features such as reflection that C++ could not hope to duplicate.

So Java (and C#) has both compile-time testing and runtime testing. Of course, there’s another boundary – the language system can only know so much, and at some point you have to start writing your own tests, that know the specifics of your particular program.

I do not mean to argue against static type checking. Discovering errors is the goal, and if they can be discovered (A) at the earliest point in the development process and (B) automatically, by the compiler, that seems like a good thing (however, the XP folks have cast some doubts on the conventional wisdom that the cost of an error goes up exponentially the later it is found). Even Python has pychecker, a tool to perform static checks.

The issue is trickier than that. What I’m trying to point out is the tradeoff between when the errors are found, and how much it’s costing to find them. And as much as we like answers that are hard and fast, saying “static type checking is universally good and always the best solution” is a recipe for eventual disaster, because even the most avid static type checking fan will grow tired of arguing with the compiler when enough rules have been heaped upon the language. The desire for expression over constraint will eventually win out.

Java Generics are a good example of this. They prevent ClassCastExceptions. How big of a problem is this, versus the overhead of (1) learning and (2) maintaining the code of the new syntax of the feature? I’ve made a similar argument about checked exceptions – the type of error they prevent is not worth the cost. In both cases the errors can still be found, by automatic language mechanisms, at runtime. So it’s still an improvement over a purely-statically-typed language like C++ (yes, I’m aware of runtime exception checking in C++, but I’m trying to speak in general terms).

So to summarize, I maintain:

# It’s about testing the correctness of your code and your system.
# Testing is a spectrum: compile-time, runtime, hand-written.
# Whenever you choose one part of the spectrum over another, it’s a tradeoff.

Always choosing the static checking approach as the only viable option will ultimately produce a programming system that is intractable.

This is about what I figured out.