Distributed Classification with ADMM

Wednesday, October 09, 2013

Today Jon Sondag and I presented our paper on ADMM for Hadoop at the IEEE BigData 2013 conference.

The paper describes our implementation of Boyd's ADMM algorithm in Hadoop Map Reduce. We talk about the statistical details of implementing ADMM as well as the nuances of storing state on Hadoop.

In our presentation we present background on the data pipeline we have built at Intent Media and motivate why a Hadoop Map Reduce job is the appropriate run-time for us to use. We mention the alternatives for building distributed logistic regression models, such as sampling the data, Apache Mahout, Vowpal Wabbit, and Spark.

We also discuss alternatives specifically designed for iterative computation on Hadoop, such as HaLoop and Twister.

Our presentation is below:

You may also read the full paper Practical Distributed Classification using the Alternating Direction Method of Multipliers Algorithm.

The paper describes our open source Hadoop based implementation of the ADMM algorithm and how to use it to compute a distributed logistic regression model.