Scalable real-time OLAP on cloud architectures
In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a scalable Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree. CR-OLAP utilizes a scalable cloud infrastructure consisting of multiple commodity servers (processors). That is, with increasing database size, CR-OLAP dynamically increases the number of processors to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as "report the total sales in all stores located in California and New York during the months February-May of all years". We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, for an Amazon EC2 cloud instance with 16 processors, a data warehouse with 160 million tuples, and a TPC-DS OLAP query stream where each query aggregates between 60% and 95% of the database, CR-OLAP achieved a query latency of below 0.3 s which can be considered a real time response.
|Keywords||Cloud architecture, Real-time OLAP, Scalability, TPC-DS benchmark|
|Journal||Journal of Parallel and Distributed Computing|
Dehne, F, Kong, Q., Rau-Chaplin, A., Zaboli, H., & Zhou, R. (2015). Scalable real-time OLAP on cloud architectures. Journal of Parallel and Distributed Computing, 79-80, 31–41. doi:10.1016/j.jpdc.2014.08.006