Data clustering is a fundamental problem arising in many practical applications. In this paper, we present new geometric approximation and exact algorithms for the density-based data clustering problem in d-dimensional space ℝ d (for any constant integer d ≥ 2). Previously known algorithms for this problem are efficient only when the specified range around each input point, called the δ-neighborhood, contains on average a constant number of input points. Different distributions of the input data points have significant impact on the efficiency of these algorithms. In the worst case when the data points are highly clustered, these algorithms run in quadratic time, although such situations might not occur very frequently on real data. By using computational geometry techniques, we develop faster approximation and exact algorithms for the density-based data clustering problem in ℝ d. In particular, our approximation algorithm based on the ε-fuzzy distance function takes O(n log n) time for any given fixed value ε > 0, and our exact algorithms take sub-quadratic time. The running times and output quality of our algorithms do not depend on any particular data distribution. We believe that our fast approximation algorithm is of considerable practical importance, while our sub-quadratic exact algorithms are more of theoretical interest. We implemented our approximation algorithm and the experimental results show that our approximation algorithm is efficient on arbitrary input point sets.

Additional Metadata
Keywords Approximation algorithms, BBD trees, Density-based clustering, Geometric algorithms, Graph traversal, Nearest neighbor search, Range search
Persistent URL dx.doi.org/10.1142/S0218195905001683
Journal International Journal of Computational Geometry and Applications
Citation
Chen, D.Z. (Danny Z.), Smid, M, & Xu, B. (Bin). (2005). Geometric algorithms for density-based data clustering. International Journal of Computational Geometry and Applications, 15(3), 239–260. doi:10.1142/S0218195905001683