Pattern recognition of strings containing traditional and generalized transposition errors
We study the problem of recognizing a string Y which is the noisy version of some unknown string X* chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straightforward transposition of adjacent characters [LW75] the problem is unsolved when the transposed characters are themselves subsequently substituted, as is typical in cursive and typewritten script, in molecular biology and in noisy chain-coded boundaries. In this paper we present the first reported solution to the analytic problem of editing one string X to another, Y using these four edit operations. A scheme for obtaining the optimal edit operations has also been given. Both these solutions are optimal for the infinite alphabet case. Using these algorithms we present a syntactic pattern recognition scheme which corrects noisy text containing all these types of errors. The paper includes experimental results involving subdictionaries of the most common English words which demonstrate the superiority of our system over existing methods.
|Conference||Proceedings of the 1995 IEEE International Conference on Systems, Man and Cybernetics. Part 2 (of 5)|
Oommen, J, & Loke, R.K.S. (R. K S). (1995). Pattern recognition of strings containing traditional and generalized transposition errors. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (pp. 1154–1159).