This paper introduces a new graph-based indexing (GBI) technique for big data systems. It uses a directed graph structure that effectively captures the simultaneous occurrence of multiple keywords in the same document. The objective is to use the relationship between the search keywords captured in the graph structure to effectively retrieve all results of Boolean AND queries at once. The performance of the proposed technique is compared with the conventional inverted index-based technique. This paper highlights that, irrespective of the intersection algorithm used to evaluate Boolean AND queries, GBI always returns Boolean AND search results faster than the inverted index. This is due to the fact that GBI always performs a smaller number of intersection operations and avoids intersection if search keywords do not have a common document. A preliminary performance analysis is performed through prototyping and measurement on a system subjected to a synthetic workload. The analysis shows that GBI improves search latency when executing Boolean AND queries by an average of 69% to 99.9% in comparison to the inverted index.

Additional Metadata
Keywords Boolean queries, graph-based index, search, text indexing
Persistent URL dx.doi.org/10.1109/CCGrid49817.2020.00-24
Conference 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020
Citation
Mohideen, A.K. (Abdulla Kalandar), Majumdar, S, St-Hilaire, M, & El-Haraki, A. (A.). (2020). A Graph-Based Indexing Technique to Enhance the Performance of Boolean and Queries in Big Data Systems. In Proceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020 (pp. 677–680). doi:10.1109/CCGrid49817.2020.00-24