Noisy subsequence recognition using constrained string editing involving substitutions, insertions, deletions and generalized transpositions
We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X* be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X*. We study the problem of estimating X* by processing Y, a noisy version of U. Y contains substitution, insertion, deletion and generalized transposition errors — the latter occurring when transposed characters are themselves subsequently substituted. We solve the noisy subsequence recognition problem by defining and using the constrained edit distance between X ε H and Y subject to any arbitrary edit constraint involving the number and type of edit operations to be performed. An algorithm to compute this constrained edit distance has been presented. Using these algorithms we present a syntactic Pattern Recognition (PR) scheme which corrects noisy text containing all these types of errors. Experimental results which involve strings of lengths between 40 and 80 with an average of 30.24 deleted characters and an overall average noise of 68.69 % demonstrate the superiority of our system over existing methods.
|Series||Lecture Notes in Computer Science|
Oommen, J, & Loke, R.K.S. (R. K.S.). (1995). Noisy subsequence recognition using constrained string editing involving substitutions, insertions, deletions and generalized transpositions. In Lecture Notes in Computer Science. doi:10.1007/3-540-60697-1_94