A probabilistic procedure is suggested for the automatic correction of spelling and typing errors in printed English texts. The heart of the procedure is a probabilistic model for the generation of the garbled word from the correct word. The garbler can delete or insert symbols in the word or substitute one or more symbols by other symbols. An expression is derived for P(Y {curly logical or} X), the probability of generating a garbled word Y from a correct word X. The model is probabilistically consistent. Using the expression for P(Y {curly logical or} X), we can derive an estimate of the correct word from the garbled word Y so as to minimize the average probability of error in the decision. One of the important features of the expression P(Y {curly logical or} X) is that it can be computed recursively. Experiments conducted using the dictionary of 1025 most common English words indicate that the accuracy of correction by this scheme is substantially greater than that which can be obtained by other algorithms especially while dealing with garbled words derived from relatively short words of length less than 6.

Additional Metadata
Keywords Levenshtein distance, minimum probability of error string correction, modeling noisy channels, probabilistic string correction, String correction, string editing techniques, text editing, word processor
Persistent URL dx.doi.org/10.1016/0167-8655(84)90038-2
Journal Pattern Recognition Letters
Citation
Kashyap, R.L. (R. L.), & Oommen, J. (1984). Spelling correction using probabilistic methods. Pattern Recognition Letters, 2(3), 147–154. doi:10.1016/0167-8655(84)90038-2