Technical Field
The technical field relates generally to transcription of content and, more particularly, to systems and methods that provide an electronic transcription market.
Background Discussion
Transcription of internet-hosted media (video and audio) is increasingly in demand. Consumers of such media often prefer having captions available to them, and in some cases (e.g. for the hearing impaired), they require it. New federal and state regulations mandating accessibility to online media are also driving this demand. Additionally, transcription of online content makes possible applications that are difficult if not impossible without transcriptions. For example, television and radio post-production usually require transcriptions of all recorded material so that the producer can easily select segments for the final product. Market research firms use transcriptions of focus group video to search for sections of the videos in which consumers are discussing a certain product. Similar search applications are enticing for institutions with large video archives, such as universities and governments.
However, transcription is expensive. Typically, services for high-quality caption creation can cost several hundreds of dollars per hour of content. For some types of content, special expertise is needed to create accurate transcriptions. For example, a university mathematics lecture may include many specialized terms, which require at least some familiarity with the material. This limits the workforce that can execute certain transcriptions, increasing the expense. Some customers require transcriptions to be done on very short deadlines, and, in combination with the difficulty of the transcription task, this requirement can further increase the cost. Further complicating matters is the fact that transcriptions that have very long durations can be onerous for a single editor to complete, especially in light of tight deadlines.
Recently, computer software systems have come into place to address this demand and these challenges. Some of these systems accept media files and use automatic speech recognition to create draft transcripts for media files. The draft transcripts, in general, have many errors. The errors can occur in the actual words that are recognized, in the timing of the words, in the formatting of the words, and in other areas that may be required of the final transcript. For example, multi-speaker media files usually require that speaker turns be labeled, and automatic speech recognition is not good at this task. Additionally, the quality of the transcriptions derived from automatic speech recognition output is extremely variable, due to a large number of factors, including acoustical environment, recording equipment, number of speakers, speaker characteristics, speaking style, complexity of content, and digital encoding algorithms. These same factors also impact the difficulty of the human transcription task.