Let us assume that we have a number of independent sources where each source generates random strings of fixed length M, composed of symbols drawn from an alphabet R. Each source generates these random strings according to its own distribution. The problem we consider in this correspondence is one of identifying the source given a sequence of random strings. Two modes of random string generation are analyzed. In the first mode, arbitrary strings are generated in which the individual symbols can occur many times in the strings. In the second mode the individual symbols occur exactly once in each random string. The latter case corresponds to the situation in which the sources generate random permutations. In both these cases, the best match to the distribution being used by each source can be obtained by maintaining an exponential number of statistics. This being infeasible, we propose a simple parametrization of the distributions. For arbitrary strings, the simple unigram based model (U-model) has been proposed. For the case of permutations, we have proposed a new model called the S-model and employed it to analyze and/or approximate unknown distributions of permutations. The relevant estimation procedures together with the applications to source recognition have been presented. Considering the fact that the symbolic data is processed, and statistically analyzed, our method clearly presents a unique blend of syntactic and statistical pattern recognition.

Additional Metadata
Keywords Approximation and modeling, Bayesian decision, closeness of approximation, parameter estimation, parametric approximation, pattern recognition, probability distributions, random permutation generation
Persistent URL dx.doi.org/10.1109/34.88575
Journal IEEE Transactions on Pattern Analysis and Machine Intelligence
Citation
Valiveti, R.S. (R. S.), & Oommen, J. (1991). Recognizing Sources of Random Strings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(4), 386–394. doi:10.1109/34.88575