We present "Pipe 'n Prune" (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks. We performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed by Ng et al.

Additional Metadata
Keywords Data cube, Iceberg cube query, OLAP, Parallel computing, Pipe 'n Prune (PnP)
Persistent URL dx.doi.org/10.1007/s10619-007-7023-y
Journal Distributed and Parallel Databases
Citation
Chen, Y. (Ying), Dehne, F, Eavis, T. (Todd), & Rau-Chaplin, A. (Andrew). (2008). PnP: Sequential, external memory, and parallel iceberg cube computation. Distributed and Parallel Databases, 23(2), 99–126. doi:10.1007/s10619-007-7023-y