This paper proposes a security-monitoring instrument that can detect and classify the location and nature of different sounds in a room. The instrument is reliable and robust even in the presence of reverberation and in low signal to noise ratio conditions. This paper proposes a new algorithm for classifying first an audio segment ax speech or non-speech then classifies the nonspeech audio segment into its own audio type. The algorithm divides an audio segment into frames, estimates the presence of pitch in each frame, and calculates a pitch ratio parameter. This parameter is then used to classify the audio segment. The threshold used in calculating this parameter is adapted to accommodate different environments. Non-speech audio segment has further classification using time delayed neural network to be classified into it is own type. The performance of the proposed algorithm is evaluated for different signal-to-noise ratios using a library of audio segments. The library includes speech segments and nonspeech segments such as windows breaking and footsteps. Using 0.4 second segments it is shown that the proposed algorithm can achieve an average correct decision for 94.5% of the reverberant library and 95.1% of the non-reverberant library.

Additional Metadata
Keywords Audio classification, Beam-forming, Feature extraction, Microphone arrays, Security-monitoring, Speech processing
Conference IMTC'05 - Proceedings of the IEEE Instrumentation and Measurement Technology Conference
Citation
Abu-El-Quran, A.R., & Goubran, R. (2005). Security-monitoring using microphone arrays and audio classification. In Conference Record - IEEE Instrumentation and Measurement Technology Conference (pp. 1144–1148).