~ / track C / clojure ecosystem

Datalog queries

Advanced

Datalog is a declarative query language descended from Prolog and used in Datomic, DataScript, XTDB, and several other Clojure data systems. You describe the shape of the data you want as a set of clauses with logic variables, and the engine finds every combination of facts that fits. Queries become data: a Datalog query is just a Clojure map or vector.

Note: examples below use the DataScript / Datomic flavor of EDN Datalog. They won't run in the embedded SCI REPL but are exactly what you'd evaluate in a real Clojure project that depends on DataScript.

The data model: EAV triples

Datalog systems store data as entity / attribute / value triples:

[100 :person/name "ana"]
[100 :person/age  31]
[101 :person/name "bob"]
[101 :person/age  42]
[100 :person/friend 101]

Every fact is a triple; entities are referenced by id. Queries select tuples of triples that match a pattern.

Minimal example

A Datalog query has a :find clause (what to return) and a :where clause (the pattern). Symbols starting with ? are logic variables.

(require '[datascript.core :as d])

(def db
  (d/db-with (d/empty-db)
    [{:db/id 100 :person/name "ana" :person/age 31}
     {:db/id 101 :person/name "bob" :person/age 42}]))

(d/q '[:find ?name
       :where [?e :person/name ?name]]
     db)
;; => #{["ana"] ["bob"]}

The pattern [?e :person/name ?name] matches every triple whose middle slot is :person/name. ?e and ?name are bound by matching; only ?name is returned.

Joins for free

Want each person with their age? Just add another clause that shares the entity variable ?e:

(d/q '[:find ?name ?age
       :where
       [?e :person/name ?name]
       [?e :person/age  ?age]]
     db)
;; => #{["ana" 31] ["bob" 42]}

No explicit join — the variable ?e does the work. Datalog joins by unification; this is what makes it elegant for graph-shaped data.

Predicates and constraints

:find-clause expressions and predicate clauses filter the matches:

(d/q '[:find ?name
       :where
       [?e :person/name ?name]
       [?e :person/age  ?age]
       [(> ?age 35)]]
     db)
;; => #{["bob"]}

The form [(> ?age 35)] is a function-call clause; the binding succeeds only when the call returns truthy.

Parameters: queries are reusable data

A query is a Clojure data structure. Parameters go through :in:

(d/q '[:find ?name
       :in    $ ?min
       :where
       [?e :person/name ?name]
       [?e :person/age  ?age]
       [(> ?age ?min)]]
     db 30)

$ is the source database; ?min is bound to the second argument. You can build queries from data (e.g. compose :where clauses) and reuse them.

Recursion: graphs are natural

Datalog supports rules, which let recursive concepts like "ancestor" or "reachable" be defined relationally. Following a :person/friend chain:

'[[(connected ?a ?b)  [?a :person/friend ?b]]
  [(connected ?a ?b)  [?a :person/friend ?x]
                       (connected ?x ?b)]]

You can then :in $ % and use (connected ?x ?y) inside :where. Recursive graph queries with three lines of clauses — the relational view paying off.

Check yourself

? quiz

In `[?e :person/name ?name]` and `[?e :person/age ?age]`, why does the engine join on `?e` for free?

Exercise

Given the same db as above, write a Datalog query that returns the names of people who have at least one friend. Hint: an additional [?e :person/friend ?_] clause is enough.

 status: new