We study the problem of recognizing a string Y which is the noisy version of some unknown string X* chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straightforward transposition of adjacent characters 2 [see J. Assoc. Comput. Mach. 22, 177-183 (1975)] the problem is unsolved when the transposed characters are themselves subsequently substituted, as is typical in cursive and typewritten script, in molecular biology and in noisy chain-coded boundaries. In this paper we present the first reported solution to the analytic problem of editing one string X to another, Y, using these four edit operations. A scheme for obtaining the optimal edit operations has also been given. Both these solutions are optimal for the infinite alphabet case. Using these algorithms we present a syntactic pattern recognition scheme which corrects noisy text containing all these types of errors. The paper includes experimental results involving sub-dictionaries of the most common English words which demonstrate the superiority of our system over existing methods.

Additional Metadata
Keywords String correction, Syntactic pattern recognition, Traditional and generalized transposition errors
Journal Pattern Recognition
Citation
Oommen, J, & Loke, R. (Richard). (1997). Pattern recognition of strings with substitutions, insertions, deletions and generalized transpositions. Pattern Recognition, 30(5), 789–800.