Distributed Machine Learning with confidence
Contemporary machine learning generally output point predictions, with predictive accuracy measured on an external test set or using cross-validation. This unavoidably leads to discussion of a model's "applicability domain", which is a fuzzy concept without stringent definition.
In this project we develop methods for enabling distributed machine learning based on Conformal Prediction theory, delivering prediction intervals instead of point predictions, implemented in the Big Data framework Apache Spark. We apply the methods primarily on drug discovery data in collaboration with partners at AstraZeneca R&D and SweTox.
Ola Spjuth (Assistant Professor, PI)
Marco Capuccini (PhD Student)
Spjuth research group: http://www.farmbio.uu.se/research/researchgroups/pb/Data-intensive/
M. Capuccini, L. Carlsson, U. Norinder and O. Spjuth. Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence. 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC), Limassol, 2015, pp. 61-67.
E. Ahlberg, O. Spjuth, C. Hasselgren, and L. Carlsson. Interpretation of Conformal Prediction Classification Models. In Statistical Learning and Data Sciences, vol. 9047 of Lecture Notes in Computer Science. Springer International Publishing, 2015, pp. 323?334.
B. T. Moghadam, J. Alvarsson, M. Holm, M. Eklund, L. Carlsson, and O. Spjuth. Scaling predictive modeling in drug development with cloud computing. J. Chem. Inf. Model., 2015, 55 (1), pp 19-25