A distributed tree data structure for real-time OLAP on cloud architectures
In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as 'report the total sales in all stores located in California and New York during the months February-May of all years'. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response.
|Conference||2013 IEEE International Conference on Big Data, Big Data 2013|
Dehne, F, Kong, Q., Rau-Chaplin, A., Zaboli, H., & Zhou, R. (2013). A distributed tree data structure for real-time OLAP on cloud architectures. Presented at the 2013 IEEE International Conference on Big Data, Big Data 2013. doi:10.1109/BigData.2013.6691613