An XML publish/subscribe system is based on filtering XML message streams for a large number of subscriptions expressed in XPath. A major issue on an XML-based publish/subscribe system is its performance. As the number of XML documents and XPath-based subscriptions increases in the system, to provide XML filtering efficiently becomes a challenging problem. Hence, there is an urgent need for optimization techniques to meet this challenge. There are many existing approaches on designing efficient XML filtering engine. Most existing research efforts focus on efficient filtering algorithms for achieving a high system performance or supporting more complex XPath syntax. Each proposed scheme has its advantages and limitations. Not much research, however, has considered using caching in the context of XML filtering. In this paper, we propose two caching schemes to be used in conjunction with an XML filtering engine. First, we present a complete message caching algorithm that is a strict caching policy to reduce the computation cost that accrues from multiple filtering of the same messages, by reusing results of previously processed messages. Second, we investigate a structure-based caching method that is an approximate caching policy for messages sharing the same structure. Performance evaluation for synthetic data and real data both show that complete message caching and structure-based caching schemes are able to achieve significantly better filtering performance (up to 80% for both caching schemes for the message streams experimented with).

Additional Metadata
Keywords Caching, Performance evaluation, Publish/subscribe, XML
Persistent URL
Conference 2009 IEEE 28th International Performance Computing and Communications Conference, IPCCC 2009
Cao, Y. (Yang), Majumdar, S, & Lung, C.H. (2009). Caching techniques for XML message filtering. Presented at the 2009 IEEE 28th International Performance Computing and Communications Conference, IPCCC 2009. doi:10.1109/PCCC.2009.5403839