Thursday, May 23, 2013
We have open sourced the categorization
libary that powers the fast dynamic
labels and clusters on the Helioid site. This
library is built to prioritize performance over accuracy. The library takes
label quality into account by first generating a set of labels and then assigning
documents to those labels, we have found that this increases the likelihood of producing
meaningful labels.
The below example shows how to create a set of labeled cluster from documents. First
include the categorize library.
require 'categorize'
include Categorize
Then define your set of documents.
documents = [
'lorem ipsum dolor',
'sed perspiciatis unde',
'vero eos accusamus',
'vero eos accusamus iusto odio'
]
Now make a model based on an additional query term, lorem, in this case.
Model.make_model('lorem', documents)
=> {
'ipsum' => [0],
'sed perspiciatis' => [1],
'vero' => [2, 3]
}
The model output is a map of cluster labels to documents within those clusters.
Install the gem and try it out.
Sunday, January 13, 2013
Prabhas Pokharel presented our paper, Improving
Data Collection and Monitoring through Real-time Data Analysis
on Friday at ACM DEV 2013 in Bangalore.
The poster is below:
The paper was coauthored with Prabhas Pokharel, Mark Johnston, and Vijay
Modi. The abstract is below:
Feedback based on real-time data is
increasingly important for ICT-based interventions in the developing world.
Applications such as facility inventories, summarization of patient data
from community health workers, etc. need processes for analyzing and
aggregating datasets that update over time. In order to facilitate such
processes, we have created a modular web service for real-time data
analysis: bamboo.
If you are interested in using bamboo please see the bamboo service website, the Python library pybamboo and the
JavaScript library bamboo.js.
Sunday, November 11, 2012
We now have a reasonable alpha version of bamboo online, from the docs:
Bamboo provides an interface for merging, aggregating and adding algebraic calculations to dynamic datasets. Clients can interact with Bamboo through a REST web interface and through Python.
bamboo includes JavaScript and Python libraries, and many
operations to choose from:
Sunday, July 22, 2012
On July 23rd and 24th Alex Dorey and I will be presenting formhub at the DataDev workshop at the IEEE Mobile Data Management (MDM 2012) conference.
Here is a blog post discussing our presentation at DataDev.
The formhub poster is below:

Friday, June 01, 2012
Today Kenneth Hamilton and I presented at the Society for Scholarly Publishing
(SSP) 2012 conference. Below are the slides, which are also available on the
Helioid
blog. Additionally, here is a brief post on the SSP
Startup Panel and our co-presenters.