~ / track C / clojure ecosystem
Datalog queries
AdvancedDatalog is a declarative query language descended from Prolog and used in Datomic, DataScript, XTDB, and several other Clojure data systems. You describe the shape of the data you want as a set of clauses with logic variables, and the engine finds every combination of facts that fits. Queries become data: a Datalog query is just a Clojure map or vector.
Note: examples below use the DataScript / Datomic flavor of EDN Datalog. They won't run in the embedded SCI REPL but are exactly what you'd evaluate in a real Clojure project that depends on DataScript.
The data model: EAV triples
Datalog systems store data as entity / attribute / value triples:
[100 :person/name "ana"]
[100 :person/age 31]
[101 :person/name "bob"]
[101 :person/age 42]
[100 :person/friend 101]
Every fact is a triple; entities are referenced by id. Queries select tuples of triples that match a pattern.
Minimal example
A Datalog query has a :find clause (what to return) and a :where clause
(the pattern). Symbols starting with ? are logic variables.
(require '[datascript.core :as d])
(def db
(d/db-with (d/empty-db)
[{:db/id 100 :person/name "ana" :person/age 31}
{:db/id 101 :person/name "bob" :person/age 42}]))
(d/q '[:find ?name
:where [?e :person/name ?name]]
db)
;; => #{["ana"] ["bob"]}
The pattern [?e :person/name ?name] matches every triple whose middle slot
is :person/name. ?e and ?name are bound by matching; only ?name is
returned.
Joins for free
Want each person with their age? Just add another clause that shares the
entity variable ?e:
(d/q '[:find ?name ?age
:where
[?e :person/name ?name]
[?e :person/age ?age]]
db)
;; => #{["ana" 31] ["bob" 42]}
No explicit join — the variable ?e does the work. Datalog joins by
unification; this is what makes it elegant for graph-shaped data.
Predicates and constraints
:find-clause expressions and predicate clauses filter the matches:
(d/q '[:find ?name
:where
[?e :person/name ?name]
[?e :person/age ?age]
[(> ?age 35)]]
db)
;; => #{["bob"]}
The form [(> ?age 35)] is a function-call clause; the binding succeeds
only when the call returns truthy.
Parameters: queries are reusable data
A query is a Clojure data structure. Parameters go through :in:
(d/q '[:find ?name
:in $ ?min
:where
[?e :person/name ?name]
[?e :person/age ?age]
[(> ?age ?min)]]
db 30)
$ is the source database; ?min is bound to the second argument. You can
build queries from data (e.g. compose :where clauses) and reuse them.
Recursion: graphs are natural
Datalog supports rules, which let recursive concepts like "ancestor" or
"reachable" be defined relationally. Following a :person/friend chain:
'[[(connected ?a ?b) [?a :person/friend ?b]]
[(connected ?a ?b) [?a :person/friend ?x]
(connected ?x ?b)]]
You can then :in $ % and use (connected ?x ?y) inside :where. Recursive
graph queries with three lines of clauses — the relational view paying off.
Check yourself
? quiz
In `[?e :person/name ?name]` and `[?e :person/age ?age]`, why does the engine join on `?e` for free?
Exercise
Given the same db as above, write a Datalog query that returns the names of
people who have at least one friend. Hint: an additional [?e :person/friend ?_]
clause is enough.