Declaring Reality: An Introduction to Datalog, part II

In the [previous installment](http://zef.me/4955/declaring-reality-an-introduction-to-datalog-part-i), I took you on a little trip to a different world. I asked you to leave all your existing ideas on this ancient concept named "programming" at the door. We don't do programming in this new world, at least not as you're used to. In this new world _you_ set the rules. _You_ declare reality. We concluded by defining an _entity_ named `person`. It was truly thrilling. Therefore, you'll be happy to learn that implementing entities in Datalog is what this second installment is mostly about. Specifically, we're going to discover how to implement the data model of our new [FocusBlox](https://bitbucket.org/zefhemel/focusblox/src) application.

To get started, let's do a little refresh of the mind. It has been two weeks, after all.

The language of this new world is called [Datalog](http://en.wikipedia.org/wiki/Datalog). We talked about a few core Datalog concepts:

* _predicates_: the core building blocks of your Datalog application, you can compare them to database tables. * _clauses_: statements of truth that define your world. The two most important clauses we talked about are: 1. _constraints_: clauses that _constrain_ predicates, e.g. by defining (restricting) the types of a predicate's arguments. 2. _rules_: which declare how new facts can be derived from old ones.

As teased in last post's cliff-hanger ending, in this series we're going to rebuild a version of [OmniFocus](http://www.omnigroup.com/products/omnifocus/), the excellent getting-things done application from the OmniGroup. The shameless rip-off name of this project will be [FocusBlox](https://bitbucket.org/zefhemel/focusblox).

FocusBlox

From the [OmniFocus website](http://www.omnigroup.com/products/omnifocus/):

> When you're ready to get serious about personal productivity OmniFocus is designed to quickly capture your thoughts and allow you to store, manage, and process them into actionable to-do items. Perfect for the Getting Things Done® system, but flexible enough for any task management style, OmniFocus helps you work smarter by giving you powerful tools for staying on top of all the things you need to do.

FocusBlox' data model is fairly simple. It has three entities:

* _actions_: these are tasks to be performed. Actions have a: * name * creation date * completion date (optional, if not completed, this is not set) * start date (optional, if set this action is only available for completion after this date) * due date (optional, if set the priority of the action increases as this date gets closer) * project (optional, if set the action belongs to this project) * context (optional, if set the action belongs to this context) * _project_: these are the projects that actions may be part of. Project can form hierarchies (projects can have sub-projects, which can have sub-projects etc.). Projects have a: * name * parent (optional, if set this project is a sub-project of its parent) * _context_: these are contexts (e.g. locations, people) related to actions to be performed. Just like projects, contexts are hierarchical. Contexts have a: * name * parent (optional, just like a project's parent)

So, how do we encode this data model in Datalog? Let's start with the most complicate one: actions.

action(a), action_id(a:id) -> uint[64](id). action_name[a] = name -> action(a), string(name). action_created_date[a] = date -> action(a), datetime(date). action_completed_date[a] = date -> action(a), datetime(date). action_start_date[a] = date -> action(a), datetime(date). action_project[a] = p -> action(a), project(p). action_context[a] = c -> action(a), context(c).

You should recognize the first line from the `person` example before. Here we say there's an `action` entity, with an `action_id` as _refmode_ (primary key) which is an unsigned 64-bit integer. Then, we declare a few other predicates with a notation we haven't seen before. This is the Datalog way of defining functional predicates, i.e. predicates that for a certain key have at most one value. In the `action_name` predicate, the key is just one variable (`a`) of type `action` (as declared behind the arrow), and its value is a `name` (of type `string`).

You can think of this as a declaring a map (or dictionary) between actions and name strings. Like in functional programming languages, objects in Datalog don't have properties. The way we encode properties is with predicates that take an object as argument. These predicates can be named whatever you like, prefixing them with the entity name like in this example is purely a convention.

Alright, so now how do we create instances of these entities and set some of their "properties"?

+action(a), +action_id[a] = 1, +action_name[a] = "Do dishes", +action_created_date[a] = datetime:now[].

In Datalog you read the commas as "and". So, in this code you're saying: "There exists and action `a` which has id 1, the name "Do dishes" and it's created is now." This is slightly verbose, so we can rephrase it in the slightly more natural:

+action(a) { +action_id[] = 1, +action_name[] = "Do dishes", +action_created_date[] = datetime:now[] }.

This is called the (LogicBlox Datalog-specific) _hierarchical_ syntax and what it does is basically insert the argument `a` as the first parameter in all predicates between the curly braces where it fits (based on the predicate's type signature). Since the name `a` is now no longer referenced anywhere else, we can replace it with a wildcard name `_`. In addition, we can declare the `action_id` as an auto numbered predicate, so that it will automatically assign an id:

lang:autoNumbered(`action_id).

Clauses that talk about predicates, like this one, are typically referred to as _pragmas_.

We can now rewrite the action creation delta as:

+action(_) { +action_name[] = "Do dishes", +action_created_date[] = datetime:now[] }.

Not half bad.

Let's define the predicates for our other data model entities. No surprises there:

project(p), project_id(p:id) -> uint[64](id). project_parent[p] = parent -> project(p), project(parent). project_name[p] = name -> project(p), string(name).

context(c), context_id(c:id) -> uint[64](id). context_parent[c] = parent -> context(c), context(parent). context_name[c] = name -> context(c), string(name).

Alright. We'll now add a few extra constraints, just for shits and giggles:

action(a) -> action_name[a] = _, action_created_date[a] = _.

project(p) -> project_name[p] = _.

context(c) -> context_name[c] = _.

The first clause says that for every action entity `a`, there should be predicates `action_name` and `action_created_date` which have `a` as their key, or, more colloquially: every action must have a name and created date, _ladies_. Similarly, projects should have names and contexts should have names. If any of these constraints are violated, you'll get an error and your transaction will be rolled back.

We have now completed the core of our data model that we'll need for our project. We defined predicates that can hold all the data that we need. However, it's not super convenient to query them them yet, for instance, how would I get a list of uncompleted actions?

Currently what you'd have to do is write a query like this:

_(a) <- action(a), !action_completion_date[a] = _.

That is: return all actions `a` where `a` does not have a value for the its completion date predicate, because we presume that only performed actions have those.

However, this will just give us back numeric `action_id`s -- not very fancy lookin', so let's change the query to something more pleasing to the eye:

_(name) <- action(a), action_name[a] = name, !action_completion_date[a] = _.

Now we just get back a list of action names which have no completion date, i.e. actions that have not been completed yet.

Of course this is one way of formulating what we're interested in, but the intention is not very clearly expressed. When you read it, specifically the last line, you have to really think what we're really after.

Wouldn't it be simpler if we'd just have a `action_completed` predicate so we could write the following?

_(name) <- action(a), action_name[a] = name, !action_completed(a).

Or, using the hierarchical syntax:

_(name) <- action(_) { action_name[] = name, !action_completed() }.

Or: give me all the names of actions that have not been completed! (yes, you always have to yell out clauses that contain a `!`.)

Well, you'll be happy to learn: yes we can! Let's define a new predicate and a _rule_ to automatically derive its values:

action_completed(a) -> action(a). action_completed(a) <- !action_completion_date[a] = _.

Dun, dun, done! One way of thinking about what we just did is defining an automatically derived property on our entity.

And that's all for this installment. In the next part we'll look at more interesting examples of deriving useful information from existing facts using rules.