Transcription in the linguistic sense is a systematic representation of language in written form. The source of a transcription can either be utterances (e.g., speech or sign language) or preexisting text in another writing system.
In the academic discipline of linguistics, transcription is an essential part of the methodologies of phonetics, conversation analysis, dialectology and sociolinguistics. It also plays an important role for several subfields of speech technology. Common examples for transcription use employed outside of academia involve the proceedings of a court hearing, such as a criminal trial (by a court reporter), a physician's recorded voice notes (medical transcription), aid for hearing impaired persons, and the like.
Recently, transcription services have become commonly available to interested users through various online web sources. Examples of such web sources include rev.com, transcribeMe®, and similar services where audio files are uploaded by users and distributed via a marketplace to a plurality of individuals who are either freelancers or employed by the web source operator to transcribe the audio file. However, it can be difficult to properly analyze an audio file in an automated fashion. These audio files are heterogeneous by nature in regard to a speaker's type, accent, background noise within the file, context, and subject matter of the audio. It is often desirable to split up audio files into multiple segments, for example based on the current speaker, general voice activity, the subject matter being discussed, and the like, in order to more easily analyze, manage, and transcribe the recorded content. Splitting a file into optimal segments often requires human involvement, which can be time consuming, inefficient and costly.
It would therefore be advantageous to provide a solution that would overcome the challenges noted above.