Recordable audio interactions comprise typically two or more audio channels. Such audio channels are associated with one or more specific audio input devices, such as a microphone device, utilized for voice input by one or more participants in an audio interaction. In order to achieve optimal performance presently available content based audio extraction and analysis systems typically assume that the inputted audio signal is separated such that each audio signal contains the recording of a single audio channel only. However, in order to achieve storage efficiency, audio recording systems typically operate in a manner such that the audio signals generated by the separate channels constituting the audio interaction are summed and compressed into an integrated recording.
As a result, recording systems that provide content analysis components typically utilize an architecture that includes an additional logging device for separately recording the two or more separate audio signals received via two or more separate input channels of each audio interaction. The recorded interactions are then saved within a temporary storage space. Subsequently, a computer program, typically residing on a server, obtains the pair of audio signals of each recorded interaction from the storage unit and extracts audio-based content by running successively a required set of Automatic Speech Recognition (ASR) programs. The function of the ASR programs is to analyze speech in order to recognize specific speech elements and identify particular characteristics of a speaker, such as age, gender, emotional state, and the like. The content-based audio output is stored subsequently in a database for the purposes of retrieval and for subsequent specific data-mining applications.
FIG. 1 describes an audio content analysis apparatus 10, known in the art. Two or more separated but time synchronized audio channels 12 constituting an audio interaction are fed into an audio summing device 16. The audio summing device 16 is typically a Digital Signal Processor (DSP) device. The DSP device 16 sums the separated audio channels 12 into an integrated summed audio stream 20. The summed audio stream 20 is transferred via a specific signal transport path to an audio storage device 22. The device 22, which is typically a high-capacity hard disk, stores the audio stream 20 as a summed audio file 24. The same two or more separated audio channels 12 constituting the audio interaction are further fed into a dedicated temporary logging device 14. The logging device 14 is a hardware device having temporary audio storage capabilities. The logging device includes an audio recorder device 25 that separately records the two or more audio channels 12 and stores the separately recorded channels as a separated audio file 26. A content analysis server 34 pools, in accordance with pre-defined rules, the separated audio file 26 from the logging device 14 via a signal transport path 18 and processes the separated audio channels via the execution of a one or more specific audio content analysis routines. The results of the audio content analysis-specific processing 32 are stored in a content analysis database 30 and are made available for data mining applications. Subsequent to the analyzing the audio could be deleted from the logging device to provide for storage efficiency.
The above-described solution has several disadvantages. The additional logging device is typically implemented as a hardware unit. Thus, the installation and utilization of the logging device involve higher costs and increased complexity both in the installation, upkeep and upgrade of the system. Furthermore, the separate storage of the data received from the separate input devices, such as the microphones, involves increased storage space requirements. Typically, in the logging-device based configuration the execution of the content analysis by the content analysis server does not provide for real time alarm activation and for pre-defined responsive actions following the identification of pre-defined events.
Therefore, it would be easily perceived by one with ordinary skills in the art that there is a need for a new and advanced method and apparatus that would provide for the content analysis of the recorded, summed and compressed audio data The new method and apparatus will preferably provide for full integration of all non-audio content into the summed signal and will support enhanced filtering of interactions for further analysis of the selected calls.