Calculating Surprise Values in ClojureScript

Saturday, April 01, 2017

Earlier this year I rewrote Ben Birnbaum’s outlier-detect Python code in ClojureScript and combined it with some very simple React glue code to build an on-demand s-Value calculator. s-Values represent the amount of surprise in a grouped value relative to other data in a larger dataset. A higher s-Value means the data is less expected.

The output is based on what Birnbaum presented as a screenshot of a prototype in his dissertation, Algorithmic approaches to detecting interviewer fabrication in surveys. In that version the color of the text changed depending on the s-Value. In my version the text color is constant but the background color of the cells are different shades of red, with darker shades indicating an s-Value more standard deviations from the mean for that row.


You can use the web interface to generate s-Values by URL from a JSON file represented as a list of maps, or use your credentials to load data in your account. This is very much a work in progress, please get in touch or open an issue with any feedback or ideas you have.

Tech at Ona: What We Built in 2016

Friday, December 30, 2016

This is a reposting or an article for the Ona blog. It’s been a big year for the Ona tech team! In this post, we look at what we built in 2016.

Ona platform tech in 2016

In 2016 we added more new features to the Ona platform than in the previous two years combined. Here’s a run-down of select features we added to Ona in 2016:

  • CSV uploads – Upload any CSV into Ona and we’ll automatically build an XLSForm from the CSV’s columns with data types guessed based on the data. E.g. if a column has only dates we’ll assume it should be formatted as a date type, but give you option to adjust that.
  • Photo gallery – View only the images from your dataset in a grid-based gallery or full-screen slideshow.
  • Dynamic form linking – Use the data in one form to populate questions in another. For example, you could use a school registration form to collect the list of all the schools in your district, and then in a school performance form you could have a drop-down menu where users choose one of the schools from the registration form and then add additional performance data about that school.
  • RapidPro integration – Forward incoming data from Ona to RapidPro and trigger flows based on that data. E.g. send out a text message to a number submitted in an Ona form with a message based on that submission’s data.
  • Google Sheets integration – Connect your dataset to Google Sheets and as you submit new data, or edit existing data, Ona will update your spreadsheet. You can use this to create lightweight dashboards with realtime data collected using Ona.
  • HXL support – Tag your dataset columns with HXL codes for easy integration into the Humanitarian Data Exchange and other existing datasets or repositories.
  • Save charts to a dashboard and chart group by – Create charts with one column grouped by another column and save any charts you create to a dashboard.

In addition to the new features above, we improved performance to handle the 4.5 million new submission we received. This was a jump from an average about 6,500 submission a day in 2015 to 12,500 per day in 2016. Next year we’ll be putting even more focus on performance and fix anything that might be slowing you down.

OpenSRP tech in 2016

We’ve made significant improvements in the OpenSRP platform. As the technical lead on OpenSRP our biggest task this year was transitioning the server and client to use an Event/Client data model. This helped us support more efficient client-server data synchronization.

We’ll continue to be busy with OpenSRP next year. We’re about to roll out a number of new implementations, including a generic vaccination register. And we’re also very excited that the UNICEF Innovation Fund invested in OpenSRP as one of their inaugural five technology investments.

Free Open Source Software at Ona in 2016

We’re still improving the documentation and doing clean up, but in late-2016 we published an updated version of our core data collection application, onadata (Github). This fixes some serious issues encountered when running at scale, introduces a more robust permissions model based around projects, and stores all data in PostgreSQL + PostGIS database. All new development will take place in this repository on the master branch with stable releases (Github) tagged.

We’ve continuously updated milia (Github), our Clojure/Script Ona Client API library. We’ve added libraries to interact with more API endpoints and improved overall stability. Also in the Clojure world, we’ve been incrementally adding functionality to our Clojure/Script utilities library chimera (Github) and our data viewer library hatti (Github).

This year we released an Ona to R integration. This lets you load realtime datasets directly into your R scripts. Forest Carbon used Ona.R to write an R Shiny web application that automated analysis and feedback. The source code for ona.R (Github) is freely available, we’re looking forward to your patches and extensions!.

Finally, we’ve released a public version of the STEPS app developed for the World Health Organization’s STEPwise approach to noncommunicable disease risk factor surveillance. If you’re interested check out the the code for the Android steps-app (Github).

We’re excitedly looking forward to an even bigger year in 2017. Happy new year from the Ona technology team!

Ghetto by Mitchell Duneier

Tuesday, April 12, 2016

About 5 years ago I was a researcher at Princeton and worked with Mitchell Duneier. We used historical text analysis to evaluate his thesis, it’s great to see his book in print. To do part of the analysis I used Google’s text n-gram data and a mix of Python, awk, bash, and R scripts.


Here’s a New York Time review with more details on his work.

Map Your World and Ona at Geo for Good 2015

Wednesday, November 25, 2015

This past October I spoke on behalf of Map Your World at the 2015 Geo for Good User Summit.

Map Your World empowers youth to explore issues and ideas that matter - like clean drinking water, or food justice – then write surveys, collect data, and create maps to make change in their communities.

Map Your World is powered by the Ona API.

Below is a video of the talk I gave, which includes a clip from the film The Revolutionary Optimists.

Writing Python Code to Decide an Election

Friday, October 03, 2014

Yesterday I spoke at PyConZA 2014 about Ona’s work building the vote tallying system for the Libyan Constitutional Assembly Election last February.

The slides from my talk are below:

Here is the abstract:

Earlier this year Ona was given three weeks to write the software that will tally votes in the Libyan elections and decide who wins and who loses. This is not something we could get wrong. We combined agile development with best practices in testing and QA to build an open source tally system that was well tested, accurate, and easy to use. We will describe a success story of iterative behavior/test-driven-development under extreme conditions. Did the structure of the data change the day before the election? Yes. Did we have the tests to ensure that our implementation changes would not compromise the system’s integrity? Yes, and they didn’t.

This talk provides a narrative to both Software Engineers and Tech/Product Managers describing why best practices are essential for any organization and any project of any size. We will provide the audience with:

Real world examples they can implement in their own workflow and organizations, Insight into what succeeded (quick iteration with prioritization) and what was challenging (nothing being static), Anecdotes and coherent arguments they can take back to their organization to advocate for best practices.

Below is the full video of my presentation: