This paper presents an improved parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, our method can be used for highly scalable processor clusters consisting of standard PCs with local disks only, connected via a data switch. Experiments show that our improved parallel method provides optimal, linear, speedup for at least 32 processors. The approach taken, which uses a ROLAP representation of the data cube, is well suited for large data warehouses and high dimensional data, and supports the generation of both fully materialized and partially materialized data cubes.

Additional Metadata
Keywords data cube, parallel computing, ROLAP
Persistent URL dx.doi.org/10.4018/jdwm.2006010101
Journal International Journal of Data Warehousing and Mining
Citation
Chen, Y. (Ying), Dehne, F, Eavis, T. (Todd), & Rau-Chaplin, A. (2006). Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel. International Journal of Data Warehousing and Mining, 2(1), 1–26. doi:10.4018/jdwm.2006010101