Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of certain attribute values in pairs of database tuples when some similarity conditions on other values are satisfied. Their enforcement can be seen as a natural generalization of entity resolution. In what we call the pure case of MD enforcement, an arbitrary value from the underlying data domain can be used for the value in common that is used for a matching. However, the overall number of changes of attribute values is expected to be kept to a minimum. We investigate this case in terms of semantics and the properties of data cleaning through the enforcement of MDs. We characterize the intended clean instances, and also the clean answers to queries, as those that are invariant under the cleaning process. The complexity of computing clean instances and clean query answering is investigated. Tractable and intractable cases depending on the MDs are identified and characterized.

Additional Metadata
Keywords data cleaning, databases, duplicate and entity resolution, integrity constraints, matching dependencies
Persistent URL dx.doi.org/10.1007/s11704-012-2007-0
Journal Frontiers of Computer Science
Citation
Gardezi, J. (Jaffer), Bertossi, L, & Kiringa, I. (Iluju). (2012). Matching dependencies: Semantics and query answering. Frontiers of Computer Science, 6(3), 278–292. doi:10.1007/s11704-012-2007-0