Conference calls are commonly used to connect a group of participants together through voice and/or video data in real time. The quality of the conference call data is important for capturing accurate data to reflect the discussions performed during the conference call. For example, a conference call may be recorded and the voice recording may be used to generate a textual transcription of the words spoken by the participants during the conference call.
Conventionally, a transcriptionist may be present during the call to hear and type the words spoken by the participants. Other ways to capture the content of the conference call may be to record the voice and use a digital software program application to convert the voice or speech into text. Voice recognition software programs may be used to convert human speech into text so that the words spoken by the conference call participants may be later referenced by others to review the content of the conference call.
Currently, a conference call is recorded by a single recording effort, which is stored in a single recorded file. A transcription of the recording may then be enacted via a speech-to-text software application that identifies the words spoken by the conference call participants and converts the words into text. Another option may be to use a human transcriptionist, or a combination of both a human transcriptionist and a software application to convert the speech into text.
Current software applications are limited to a single recording of a conference call with limited ability to distinguish between different speakers. Such applications have a low percentage of accuracy due to the limited capabilities of the software application. For example, the software application may not be capable of distinguishing the different voices from one another. Also, the different recorded voices may be overlapping, and some voices may be louder than other recorded voices. These various shortcomings of a single data recording may cause interference when attempting to transcribe the audio content of the voices of the conference call participants.