The cgmCUBE project: Optimizing parallel data cube generation for ROLAP
On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi-dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems. In this paper we discuss the cgmCUBE Project, a multi-year effort to design and implement a multi-processor platform for data cube generation that targets the relational database model (ROLAP). More specifically, we discuss new algorithmic and system optimizations relating to (1) a thorough optimization of the underlying sequential cube construction method and (2) a detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction. These optimizations were key in allowing us to build a prototype that is able to produce data cube output at a rate of over one TeraByte per hour.
|Keywords||Data cube, Parallel processing, ROLAP|
|Journal||Distributed and Parallel Databases|
Dehne, F, Eavis, T. (Todd), & Rau-Chaplin, A. (Andrew). (2006). The cgmCUBE project: Optimizing parallel data cube generation for ROLAP. Distributed and Parallel Databases, 19(1), 29–62. doi:10.1007/s10619-006-6575-6