An effective algorithm for string correction using generalized edit distances-I. Description of the algorithm and its optimality
This paper deals with the problem of estimating a transmitted string Xs from the corresponding received string Y, which is a noisy version of Xs. We assume that Y contains any number of substitution, insertion, and deletion errors, and that no two consecutive symbols of Xs were deleted in transmission. We have shown that for channels which cause independent errors, and whose error probabilities exceed those of noisy strings studied in the literature , at least 99.5% of the erroneous strings will not contain two consecutive deletion errors. The best estimate X+ of Xs is defined as that element of H which minimizes the generalized Levenshtein distance D( X Y) between X and Y. Using dynamic programming principles, an algorithm is presented which yields X+ without computing individually the distances between every word of H and Y. Though this algorithm requires more memory, it can be shown that it is, in general, computationally less complex than all other existing algorithms which perform the same task.
Kashyap, R.L. (R. L.), & Oommen, J. (1981). An effective algorithm for string correction using generalized edit distances-I. Description of the algorithm and its optimality. Information Sciences, 23(2), 123–142. doi:10.1016/0020-0255(81)90052-9