Recently emerging software applications are large, complex, distributed and data-intensive, i.e., big data applications. That makes the monitoring of such applications a challenging task due to lack of standards and techniques for modeling and analysis of execution data (i.e., logs) produced by such applications. Another challenge imposed by big data applications is that the execution data produced by such applications also has high volume, velocity, variety, and require high veracity, value. In this paper, we present our monitoring solution that performs real-time fault detection in big data applications. Our solution is two-fold. First, we prescribe a standard model for structuring execution logs. Second, we prescribe a Bayesian classification based analysis solution that is MapReduce compliant, distributed, parallel, single pass and incremental. That makes it possible for our proposed solution to be deployed and executed on cloud computing platforms to process logs produced by big data applications. We have carried out complexity, scalability, and usability analysis of our proposed solution that how efficiently and effectively it can perform fault detection in big data applications.

Additional Metadata
Keywords Applications, Bayesian Classification, Big Data, Fault detection, MapReduce
Persistent URL dx.doi.org/10.1109/ICMLA.2017.00-89
Conference 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017
Citation
Shafiq, M.O, Fekri, M. (Maryam), & Ibrahim, R. (Rami). (2018). MapReduce based classification for fault detection in big data applications. In Proceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017 (pp. 637–642). doi:10.1109/ICMLA.2017.00-89