1. Field
The present disclosure relates to a sound monitoring method, and more particularly, to a sound detection method of classifying various kinds of mixed sounds in an actual environment, determining whether or not a user is exposed to a dangerous situation, and recognizing a hazard situation.
2. Background
Generally, closed circuit television (CCTV) refers to a system which transfers video information to a particular user for a particular purpose, and is configured so that an arbitrary person other than the particular user cannot connect to the system in a wired or wireless manner and receive a video. CCTVs are mainly used in various surveillance systems for places congested with people, such as large discount stores, banks, apartments, schools, hotels, public offices, subway stations, etc., or places that require constant monitoring, such as unmanned base stations, unmanned substations, police stations, etc., and play a major role in acquiring clues from various crime scenes.
The market scale of CCTV cameras and Internet protocol (IP) cameras which are used as security cameras have drastically grown since 2010, and the Korean market of security cameras also grew to about 420 billion Korean won in 2013. In light of this, it can be seen that a security system for preventing various crimes is attracting attention these days.
However, in spite of the rapid proliferation of security cameras such as CCTVs, blind spots of security cameras still remain, and a crime rate is not being reduced. When one camera is used to monitor several directions, even if a guard continuously changes the position of the camera, it may be impossible to continuously monitor the surveillance area due to carelessness of the guard or a lack of guards, and a surveillance system may not fully achieve its role.
Also, when a plurality of security cameras are installed to minimize blind spots, the number of screens to be monitored increases, and a larger number of security workers are required to monitor the screens. Although blind spots are reduced and a probability that a crime scene will be recorded increases, a probability that the crime will be handled in real time is reduced and the cost of equipment increases. Therefore, this is not an efficient method for crime prevention.
Consequently, to rapidly cope with a dangerous situation such as with crime, it is necessary to rapidly determine whether or not a dangerous situation has actually occurred for a user by detecting and classifying not only video images shown through a surveillance camera but also acoustic events included in the video images.
To classify a sound according to related art, a system is utilized for identifying three types of sounds, such as explosions, gunshots, screams, etc., through two operations of detecting a particular event sound, such as a gunshot or a scream, using a Gaussian mixture model (GMM) classifier and identifying sounds of events using a hidden Markov model (HMM) classifier based on Mel-frequency cepstral coefficient (MFCC) features. However, the aforementioned methods have problems in that the accuracy of sound detection is not ensured at a low signal-to-noise ratio (SNR), and it is difficult for the HMM classifier to distinguish between ambient noise and event sounds.