Cascalog at Intent Media

Monday, April 28, 2014

While I was at Intent Media I led the data engineering team in rebuilding and extending the Intent Media data platform. To structure and simplify queries we relied on Cascalog, a Clojure DSL built on top of the Cascading library that is built on top of Apache Hadoop.

Cascalog is inspired by Datalog and uses logic programming to simplify query expression. It is similar to Datomic for Clojure and the recent DataScript for ClojureScript. This allows simple and concise queries, e.g. to compute the average age per country:

(?<- (stdout) [?country ?avg] 
   (location ?person ?country _ _) (age ?person ?age)
   (c/count ?count) (c/sum ?age :> ?sum)
   (div ?sum ?count :> ?avg))

Jon Sondag, a data scientist at Intent Media, recently gave a presentation at the NYC Clojure Meetup about Cascalog in production. His slides are embedded below.

It is great to see Cascalog being used in production data platforms.