Event segmentation is an important step in monitoring and management applications that categorizes different events into different segments. This is important especially when applications, to be monitored and managed, are large-scale, comprehensive and data-intensive in nature. The process of segmentation is based on data clustering which is one of the key data mining methods used these days. There are several decent algorithms and techniques that exist to perform clustering on small to medium scale data. In the era of Big Data and with applications being large-scale and data-intensive in nature, there is a significant increment in volume, variety and velocity of data in the form of log events produced by such applications. This makes the task of clustering of huge amounts of data more challenging and limited. This paper presents a proposed an effective and efficient approach of event segmentation in logs. It is based on parallel k-means clustering, inherited from MapReduce paradigm, to be used for event segmentation. The proposed approach has been tested and evaluated on large-scale log data derived from real-life case-study. Evaluation includes measuring efficiency and effectiveness of the proposed solution for its usability on log data with large volume, variety and velocity, as well as its applicability on large-scale applications.

Additional Metadata
Keywords Clustering, Event segmentation, MapReduce, Parallel
Persistent URL dx.doi.org/10.1109/BigData.2016.7840804
Conference 4th IEEE International Conference on Big Data, Big Data 2016
Shafiq, M.O. (2016). Event segmentation using MapReduce based big data clustering. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 1857–1866). doi:10.1109/BigData.2016.7840804