On Language Design: My Problem With ClojureQL

Every programming language comes with a certain syntax, a certain feel for what feels like native use of that syntax, and the semantics of the syntax. Escapes and mixing with a completely different feeling language are generally not a good idea. My favorite example of this is Objective-C, which is a really strange mixture of C and Smalltalk. C as you will know is a curly brace language, it has a way of doing things. It’s a low-level system programming language. Smalltalk is a high-level programming language that feels very different and looks very different.

Objective-C is C with some Smalltalk bolted on to it, which gives it a strange feel:

MyObject* o = [[MyObject alloc] initWithNum: 20 andString: @"Hello world!"];

In C, a function call has the syntax function_name(arg1, arg2). However, when moving into Objective-C object land, a method call looks like: [object aMethodCall: arg1 andArg: arg2]. Alien, if you ask me.

In Lisp land, an example of this is the common-lisp loop macro:

(loop for x in '(a b c d e)
      for y from 1
      if (> y 1) do
        (format t ", ~A" x)
      else do
        (format t "~A" x))

If you’ve ever written any Lisp code, you’ll see that although this is very readable and concise, like Objective-C, by the way, it feels completely weird in a Lisp-style language.

I have a similar problem with ClojureQL, a query language for Clojure. Queries expressed in ClojureQL change the meaning of Clojure in a way that I feel is bad language design because it it breaks assumptions that hold true for the rest of Clojure.

Consider the following snippet of code:

(let [first-name "zef"]
   ...
   (= first-name "zef") ...)

This piece of code binds the value "zef" to the symbol first-name. The programmer’s expectation is that when the first-name symbol is used anywhere within the let, its value will be "zef", unless it is rebound to something else with another let. However, this assumption breaks when using ClojureQL:

(let [first-name "zef"]
  (query users * (= fname first-name)))

This is legal in ClojureQL, although it is a bit unclear where fname would come from, it comes from the * there, we can make this more explicit:

(let [first-name "zef"]
  (query users [fname lname] (= fname first-name)))

This is perfectly valid ClojureQL code, except it doesn’t do what you would expect it to do. It does not find all users with first name "zef", no, it will throw an SQL exception saying that the table users does not have a field "first-name". Huh?

It turns that when we use the query macro, we step into a different world, a world where we have to let our previous assumptions go. When first-name is used, it no longer refers to the value bound to it before, instead it’s simply a name referring to a column in a table. It is still possible to escape to "normal" Clojure semantics by escaping back into the Clojure world with a ~ prefix:

(let [first-name "zef"]
  (query users [fname lname] (= fname ~first-name)))

I’m not very fond of this type of language design. It would probably be better if a ~ would not be necessary, in that case you could read the query as a kind for loop where each result row is destructured and bound to [fname lname], which are then used in the body expression. However, still, intuitively in this interpretation the names of fname and lname should not refer to column names in the users table, but instead are only to be used for binding in the code, referring to the first and second column in the result set. Still confusing.

A syntax that is more Clojuresque, if you will, albeit more verbose would be:

(let [first-name "zef"]
  (query [u users] (= (:fname u) first-name)))

Intuitively, the query iterates over all users binding each user to u and filtering on the value of the :fname key of each user entry. I’m still not confortable with the use of users there, which seems some type of magic symbol, but I suppose that could be fixed too. Maybe of having a (deftable users) statement somewhere else in the code, or replacing with with (table :users), which, again, would make it slightly more verbose:

(let [first-name "zef"]
  (query [u (table :users)] (= (:fname u) first-name)))

The point is that with great power comes great responsibility. The macro facilities of Lisps give you enormous power to create your own language extensions, which is great. It makes experimenting with languages very easy. However, it turns out that language design is very difficult. The language syntax is the user interface of your language. Whereas typical languages like Java and C# evolve very slowly and are designed by experienced language designers, in a Lisp anybody can do it, which can result in very confusing abstractions.

Abstractions like these have to be designed very, very carefully.

Tags: , ,

  • kotarak
    Language design is difficult. I tried on several small DSLs and failed miserably. Maybe ClojureQL is just another one in this chain of failures.

    However, on has to keep in mind, that ClojureQL is a DSL for SQL! That means in particular you cannot leave out SQL when speaking about intuition. In SQL "SELECT * FROM .." and "SELECT a,b FROM .." might make a difference! So if ClojureQL cannot model that it failed in its technical goal.

    The reason we chose quasiquotation was that we found

    (query users [a b c] (< d ~foo))

    more readable than

    (query :users [:a :b :c] (:< :d foo))

    BTW: you knew, that the second is legal ClojureQL besides the ~ in front of foo?

    SQL is not Clojure and with query you leave Clojure land. This has to be kept in mind. Just as you leave Clojure land when you use c.test/is.

    With (hypothetical) destructuring (which is maybe intuitive from a clojure point of view) the above may be written as

    (query [[_ _ _ _ _ _ _ d] :users] (:< d foo))

    < is still not the clojure function of that name, so it must be a keyword. But how do I now specify, that I want the columns a, b and c in my output? Just adding them to destructuring doesn't help if I don't want d in the output. Also sequence destructuring is not a good fit for a table with named columns. So maybe map destructuring?

    (query [{:columns [d] :output [a b c]} :users] (:< d foo))

    Ah. The awkward symbols again. So we need

    (query [{:columns [d] :output [:a :b :c]} :users] (:< d foo))

    And now :columns and :output have to treated differently which is another thing to keep in mind increasing the complexity of the API.

    I don't claim ClojureQL is perfect. I'm always open for improvement suggestions. There is Lau and me, you can reach both of us by mail or the tracker at Lighthouse. Let us know your issues with ClojureQL and your suggestions for improvement.

    And as always: in the end YMMV.

    Edit: As for the "experienced language designers" and Java: see Joshua Bloch's presentation on API design and why it matters for some examples that an "experienced language designer" is not a guarantee for good design.
  • Hi, sorry for my late response.

    About the SQL/Clojure land distinction, that's a choice: do you want to simply provide a Lispy syntax for SQL or do you want to offer a syntax that feels and behaves like Clojure, but under the hood translates to SQL? LINQ in .NET is not SQL, it looks a bit like it but abides by the rules of C# and VB, under the hood it translates the queries to SQL, it is not simply a more C#-y syntax for SQL.
  • kotarak
    I reviewed the syntax. I mulled over it and wrote up a proposal, which will now be discussed project internally. Here you can see a draft of how the revised syntax could look like. I'm open for any feedback.
  • That looks a lot better! I like it. The only issue I have with it is that you are changing the meaning of keywords, because they refer to fields in a table now, rather than to keyword values. But I supposed that's a trade-off. It's shorter than the (:fname u) syntax I suggested.
  • kotarak
    At some point idealism stops and pragmatism kicks in. Is it worth the trouble?

    One could use symbols "bound" to the selected tables. But that would require information from the from-table call moving us again more back to macros.

    Then: Does this blow up simple queries too much? Or is it tolerable? Honest question.

    One could view the keywords as functions of queries (as for maps) and the predicates in the WHERE part to be combinators of such functions. Just as I can write (map (juxt :foo :bar) seq-of-maps).
  • I would say it's tolerable and probably good for brevity's sake.
  • I agree that keywords would be better than symbols for things like table and column names. That's what clojure.contrib.sql uses. Unquoted symbols being used as strings is awkward.
blog comments powered by Disqus