The present invention relates generally to increasing the speed of sound files, and more particularly to increasing the speed of spoken content without compromising the quality of the content.
In pedagogic videos such as Massive Online Open Courses or instructional videos, the professor teaches a class or the instructor demonstrates a process. The main feature of these videos is that instructors talk and explain a field of discipline or demonstrate a skill. The instructors often speak at a slow rate with extended pauses between words or sentences. The reasons for this slow delivery are the following.
First, the instructor has to speak continuously for an extended time, so the instructor needs to catch his or her breath and overcome fatigue and exertion. Next, the speed of instruction delivery is maintained at a slow pace to align with the pace of cognition of average or slow learners. However, the speed cannot be maintained to keep pace with the cognition of fast learners. Finally, many demonstrations utilized by teachers and instructors require extra time to complete. For example, a teacher reciting a formula and simultaneously writing the formula on a blackboard requires additional time. Writing takes much longer than talking, so the teacher or instructor usually adjusts his or her talking speed to sync with his or her writing speed.
When learners are going through the playback, they often play back the videos at higher speeds, such as 1.25×, 1.4×, and 1.5×, to save time and to keep pace with their cognition. Even though this often saves times for the user, there are a few words or some particular sentences that a speaker says quickly. Such words cannot be generalized as they depend on the speaker's pronunciations, the speaker's accent, situations occurring at the time of recording, and changes in the emotions of the speaker, for example. Therefore, the learner fails to understand such words or sentences when played back at higher speeds. So, the learner has to slow down the playback speed, rewind, and listen to the recording again. This wastes time and is annoying to the learner. To avoid this annoyance, the learner has to choose a slower playback speed. For example, if the learner can go through 90% of the lecture at 1.5× speed, but the remaining 10% of the lecture is only comprehensible at 1×, then the learner is forced to playback the complete lecture at 1× speed only. Thus, if a lecture of 60 minutes duration can be played back at 1.5× speed in 40 minutes, the learner will still have to take all 60 minutes of time at the original playback speed. Even for very slow talkers, a playback speed higher than 1.6×-1.75× becomes difficult comprehend due to a much lower sampling rate. Therefore, there is an upper limit to the playback speed before the clarity and human comprehension quality is negatively impacted.
Some conventional speech playback systems include “truncate silence” functionality. Truncate silence automatically reduces the duration of passages where the volume is below a specified level. Silences to be truncated are detected if they remain below a specified volume level for at least a predetermined time duration. Detected silences are truncated by deleting a section from the middle of the silent region. In some conventional audio playback systems, white space between or preceding audio clips will be regarded as absolute silence and will also be truncated.