Matching Dependencies (MDs) are a recent proposal for declarative entity resolution. They are rules that specify, given the similarities satisfied by values in a database, what values should be considered duplicates, and have to be matched. On the basis of a chase-like procedure for MD enforcement, we can obtain clean (duplicate-free) instances; actually possibly several of them. The clean answers to queries (which we call the resolved answers) are invariant under the resulting class of instances. In this paper, we investigate a query rewriting approach to obtaining the resolved answers (for certain classes of queries and MDs). The rewritten queries are specified in stratified Datalog not,s with aggregation. In addition to the rewriting algorithm, we discuss the semantics of the rewritten queries, and how they could be implemented by means of a DBMS.

Additional Metadata
Persistent URL dx.doi.org/10.1007/978-3-642-32925-8_10
Citation
Gardezi, J. (Jaffer), & Bertossi, L. (2012). Query rewriting using datalog for duplicate resolution. doi:10.1007/978-3-642-32925-8_10