We present "Pipe 'n Prune" (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks. We performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed by Ng et al.

Data cube, Iceberg cube query, OLAP, Parallel computing, Pipe 'n Prune (PnP)
Distributed and Parallel Databases
School of Computer Science

Chen, Y. (Ying), Dehne, F, Eavis, T. (Todd), & Rau-Chaplin, A. (Andrew). (2008). PnP: Sequential, external memory, and parallel iceberg cube computation. Distributed and Parallel Databases, 23(2), 99–126. doi:10.1007/s10619-007-7023-y