Fast aggregation of data with many dimensions is a key component of many applications. The R-tree is the traditional data structure for indexing multi-dimensional data, but even the best R-tree variants suffer from performance degradation as the number of dimensions increases. The DC-tree addressed this issue by replacing Minimum Bounding Rectangle (MBR) keys with Minimum Describing Subsets (MDSs), which are less susceptible to overlap. This technique dramatically improves query performance with many dimensions, but at the cost of reduced insertion performance. Like most R-tree variants, this insertion overhead comes from expensive geometric comparisons while selecting the best child for insertion, or splitting over-full nodes. DC-trees, including the parallel PDCtree, suffer even more from this overhead since MDSs are typically much more expensive to compare and manipulate than MBRs. This paper introduces the Hilbert PDC-tree, a parallel index structure for many-dimensional data that supports high-velocity data ingestion. This is achieved by avoiding geometric comparisons during insertion by instead inserting records based on the Hilbert index of their keys. This approach is similar to that of the Hilbert R-tree, but with special considerations for efficiently supporting many hierarchical dimensions. Additionally, a new node splitting algorithm significantly reduces overlap and improves query performance. Experiments show that the Hilbert PDC-tree scales well to a high number of dimensions, while supporting a much higher rate of ingestion and better query performance than the PDC-tree.

Additional Metadata
Persistent URL
Conference 20th International Database Engineering and Applications Symposium, IDEAS 2016
Robillard, D. (David), Dehne, F, Rau-Chaplin, A. (Andrew), & Burke, N. (Neil). (2016). The hilbert PDC-tree: A high-velocity structure for many-dimensional data. In ACM International Conference Proceeding Series (pp. 164–172). doi:10.1145/2938503.2938549