In the emerging digital age, massive production of data is occurred actively or passively by collecting data from users and environment via applications, sensor devices and so on. That makes it important and crucial to have the ability to process big data efficiently and effectively utilize it. The challenge to process big data is that it has high volume, velocity, variety, as well as veracity and value. In this paper, we present a survey of related work and prescribe our recommendations towards building Bayesian classification for big data environments. It is based on MapReduce and is distributed, parallel, single pass and incremental which makes it feasible to be deployed and executed on cloud computing platform We also carry out scalability analysis of the proposed solution that it can train Bayesian classifier to perform predictive analytics by processing big data with large volume, velocity and variety.

Additional Metadata
Keywords Bayesian, Big-Data, Classification, Distributed, Parallel, Single-pass, Incremental
Persistent URL dx.doi.org/10.1109/HPCCWS.2017.00013
Conference 2017 IEEE International Conference on High Performance Computing and Communications Workshops, HPCCWS 2017 and 8th Multicore and Multithreaded Architectures and Algorithms, M2A2 2017
Citation
Shafiq, M.O, Yang, Y. (Yibing), & Fekri, M. (Maryam). (2018). A Survey and Recommendations for Distributed, Parallel, Single Pass, Incremental Bayesian Classification Based on MapReduce for Big Data. In Proceedings - 2017 IEEE International Conference on High Performance Computing and Communications Workshops, HPCCWS 2017 and Multicore and Multithreaded Architectures and Algorithms, M2A2 2017 (pp. 42–49). doi:10.1109/HPCCWS.2017.00013