Referring to FIG. 1, a conventional sound classification system 8 is used for receiving a sound that has yet to be classified, and recognizing the sound so as to determine whether the sound belongs to a sound type that has a particular significance for the user.
The conventional sound classification system can pre-classify a plurality of sound types that have particular significance and store the same in the sound classification system 8. For instance, siren sounds, telephone ringing sounds, and sounds of breaking glass can be three main pre-classified sound types. When a sound of the aforementioned types is present in the environment, the sound classification system 8 can receive and classify the sound, and can notify the user of the occurrence of a certain type of sound. For instance, when the sound classification system 8 determines there are telephone ringing sounds, it can notify the user to answer the telephone. Alternatively, when the sound classification system 8 determines there are sounds of breaking glass, it notifies the user of the possibility of a thief breaking in through a window.
The conventional sound classification system 8 includes a sound receiver 81, a feature extractor 82, a classifier 83, a database 84, and a classification recorder 85. The database 84 stores features of a plurality of sound signals. The sound receiver 81 is any piece of equipment capable of receiving sounds, such as a microphone. The feature extractor 82 can receive the sound signal from the sound receiver 81, and can find out the feature of the sound signal.
The feature extractor 82 analyzes a feature vector of the sound signal using Mel-scale Frequency Cepstral Coefficients (MFCC), and uses the feature vector as the feature of the sound signal. For the MFCC scheme, reference can be made to “Fundamentals of Speech Recognition” by L. Rabiner and B.-H. Juang, Prentice Hall, 1993. According to the MFCC scheme, the sound signal is transformed from a time domain signal to a frequency domain signal using Fourier Transform, the frequency domain signal representing the energy of the sound at each frequency. Energy values of corresponding frequency ranges are obtained from the multiplication of a plurality of triangular band-pass filters covering the different frequency ranges, each of which represents a different weight, by the energy value of each corresponding frequency, where one of the triangular band-pass filters covers a corresponding sound frequency range that is perceivable by human auditory organs. Thus, a plurality of feature values equivalent to the triangular band-pass filters in number can be obtained. The feature values can be used as a feature vector that can represent the sound.
The database 84 pre-stores the features of many sound types, e.g., siren sound, telephone ringing sound, sound of breaking glass, and door opening sound, where each sound type in general includes a plurality of sounds. For example, the door opening sound type includes a plurality of pre-recorded door opening sounds.
The classifier 83 compares the feature analyzed by the feature extractor 82 with the pre-stored features in the database 84. When the feature analyzed by the feature extractor 82 matches or is similar to the feature of one of the sound types in the database 84, the sound received by the sound receiver 81 will be regarded as an instance of said one of the sound types. The classification recorder 85 stores the classification result of each inputted sound classified by the classifier 83, as well as the feature thereof.
The classification scheme adopted by the classifier 83 can be the Mahalanobis Distance scheme described in “Pattern Recognition” written by S.-T. Bow and published by Jwang Yuan in 1984. The scheme is mainly used to calculate the distance between the feature vector of the sound signal received by the sound receiver 81 and those of all sound types stored in the database 84. When a smallest distance exists, this will indicate that the sound signal received by the sound receiver 81 matches the corresponding sound type in the database 84.
However, there are numerous sound types in real life. It is impossible for the conventional pre-constructed database 84 to contain all the sounds that may probably occur. Besides, the conventional sound classification system 8 cannot process sound types that are not stored in the database 84. Therefore, if the user is allowed to add sound types to the database 84, the practicality of the sound classification system 8 can be effectively enhanced.
Furthermore, due to differences between various environments, sounds may also exhibit different features in different environments. For instance, the same door opening sound may produce a relatively loud echo in a relatively spacious environment, but may have a completely different feature in an environment that can better absorb its sound energy. When the different types of sounds are recorded in the database 84, the environments in which the sounds are recorded are oftentimes different from the environment the user is in. If the user cannot add or correct samples of the sound types pre-constructed in the database 84 with respect to the environment of use, this may result in problems that the sound classification system 8 makes erroneous classification in a new environment, or is even unable to perform the classification.