Observation of spectral characteristics is performed for characterizing different type of sounds. The soundscaping has an application in the areas of music, health care, noise pollution etc. In order to differentiate a particular type of sound with the other sounds, mel frequency filter banks are highly used. Mel Frequency Cepstral Coefficients (MFCC) [reference 4] is commonly used as features in speech recognition systems. They are also used for audio similarity measures. For example, in road traffic conditions [references 1, 2, 3] MFCC are used to differentiate the horn sound with the other traffic sounds. This is done to reduce the probability of road accidents by correctly identifying the horn sound.
Many of the solutions have been proposed to detect and track a particular type of sound by using mel filter banks. MFCC (Mel Frequency Cepstral Coefficients) are largely used for classification of sounds. In the existing systems designed for sound detection, feature selection is mainly based on mel frequency cepstral coefficients. Further, good results are observed by employing the GMM (Gaussian Mixture Model) [reference 7], or any other model, for classification purpose. The existing mel filter bank structures are more suitable for speech as they effectively captures the formant information of speech due to the high resolution in lower frequencies. However, all such systems remain silent on the usage of spectral characteristics of sound in the design of the filter bank and do not consider it while selecting features which may provide the better results. Modifying the mel filter bank by observing the spectral characteristic may provide better classification of a particular type of sound. Also, threshold based methods are used for a particular sound detection by observing the spectrum but these methods cannot work for all the cases where there is variation in frequency spectrum.
Large number of prior art also teaches about the sound recognition system and processes. EP0907258 discloses about audio signal compression, speech signal compression and speech recognition. CN101226743 discloses about the method for recognizing speaker based on conversion of neutral and affection sound. EP2028647 provides a method and device for speaker classification. WO1999022364 teaches about system and method for automatically classifying the affective content of speech. CN1897109 discloses about the single audio frequency signal discrimination based MFCC. WO02010066008 discloses about multi-parametric analyses of snore sounds for the community screening of sleep apnea with non-gaussianity index. However, all these prior arts remain silent on considering the varying frequency distribution in sound energy spectrum in order to provide an improved classification.
Therefore, there is a need of a system and method which is capable of detecting a particular type of sound by considering the spectral characteristics of sound for designing the filter bank structure. Also, the system and method should be capable of detecting sound while reducing the complexity.