1. Field of the Invention
The present invention relates to the field of audio playback technology and techniques. More specifically the present invention relates to audio playback technology situated in a computer controlled environment running on a software driven platform.
2. Prior Art
Audio data is increasingly being used with and incorporated into the desktop computer environment allowing computer users more flexibility in data management. Audio data, in the form analog information signals stored on a flexible tape or in a digital format stored in a computer's memory or hard drive, can be retrieved from these storage mediums by the computer system and played through an internal computer speaker to an end user. Software control routines and programs residing on a typical desktop computer act to control, through a user interface, the interaction of the user and the audio data desired for playback. Special menus and display formats allow previously stored audio data to be accessed readily by the user, i.e. with a mouse and display screen.
Audio voice data is currently used in desktop computer systems in a variety of ways and for a variety of functions. For example audio voice data can be used for recording dialog sessions, such as instructions given to a secretary. Voice data located by displayable "tags" can be placed within a text document on a display screen to give personalized instructions on the proper way to amend a particular document when the tag is activated, such as by a mouse or other user input device. Voice data is also used as a means for dictation where a document is spoken into a dictation device for a typist or data entry secretary. Voice data can also be used for recording scratch notes by the user for future reference or reminders which can be accessed by the user interface software of the desktop computer. Voice data can be used to record meeting information or interview sessions and for recording class instructions for later playback. Also, voice data is effectively used over a computer system as a new means of electronic mail by voice message, instead of text.
Computer systems are a natural and progressive platform to interface with recorded voice data because computer systems offer an unlimited amount of avenues to access previously recorded data. For instance, a regular tape cassette player records voice data on a continuous playing tape, usually with two sides, A and B. In order to playback a certain portion of voice data, the cassette must cycle through all of the preceding tape segments before the target: portion is reached thus creating a large access delay for a target portion and also generating a good deal of wasted playback for unwanted voice segments. Further, if a particular voice segment is not localized or identified originally, one must play through all of the tape to locate the segment because of the serial nature of the tape medium. This is true because most tape storage mediums to not allow for easy marking of tape portions for playback at those tagged selections.
A computer system is uniquely designed to handle these problems. A computer system can "tag" selected portions of voice data and remember where in the storage medium they have been placed for easy and ready playback. A computer system is not limited to a tape storage device and can place voice data in a memory unit such as on board RAM or within a disk drive storage unit. Both memory storage devices named above provide for quick and easy access to any audio segment without wasted or excessive accessing as with a conventional cassette tape.
Audio and voice data also complements the computer system's use as an information processing tool. Voice data along with graphics and text provide more information available to a user in a "user-friendly" or "personalized" environment. Thus, instead of receiving tasks or lists of things "to do" a user might find a familiar voice carrying instructions for the user that were pre-recorded by another. Also, computer driven "voice-mail" creates more efficient and personalized way to transmit and receive office memos or other communications between users of interconnected computer systems.
Currently, audio or voice data can be stored directly into a computer memory storage unit in digital form. This provides an easy method for playback, however, does not allow for liberal voice storage capacity as 25 milliseconds of voice storage can consume up to 500 bytes of data depending on the storage format and the sample rate and sample size. Voice data can also be stored on a specialized tape or cassette player which interfaces to the computer system. The computer system would then control the accessing scheme and playback rates of the cassette player and the voice data would be fed by the player into the computer for processing and translation into digital form, if needed. Using at least these two storage and playback methods, voice or audio data can conveniently be incorporated into a computer system and used advantageously by a computer user.
Therefore, it is clear that voice and audio data will become one of the next information forms utilized heavily by modern computers. Devices and techniques that can manage effectively and process computer driven audio data wilt be inherently advantageous to these computer systems. The present invention is drawn to an apparatus and method to better provide access to prerecorded audio and voice data which is accessed by use of a computer system. The present invention allows users to move efficiently access previously stored audio data.
Even within computer systems that integrate audio data and user interfaces for playback, some inefficiencies do exist in the way in which audio data is selected. For instance, once a particular segment of audio data is reached, i.e. because it was previously tagged with a special locator, a user may only desire to listen to a particular phrase or data packet within the segment. Or a user may want to increase the playback rate of the audio data. Therefore, the user will playback the entire segment waiting for that desired phrase or data. In this case the user is "scanning" the tape segment for the desired portion. It is desirable, then, to provide a method and apparatus of speeding up the playback rate of unwanted audio data while at the same time providing intelligible playback audio so that the user can quickly identify the desired phrase. The present invention provides such a function and apparatus.
Some prior art systems allow users to listen to messages at double speed. This technique is accomplished by modifying the previously stored audio data. The result is that undesirable clicks and noises appear at the spaces where modifications occur, which may be separated by only 20-25 milliseconds. This creates unacceptable background noise and "hissing" sounds which reduces the quality of the sounds. Also, musicians use analog and digital sound processing units to change the pitch of an audio signal in real time without changing its duration; this is called Harmonizing. The processing hardware and software complexity required for Harmonizing makes it undesirable for desktop computer applications. Lastly, Time Domain Scaling is available to transform a sampled sound with a speed change into a sampled sound that has the pitch of the original sampled sound, but a different duration. Although the sound quality of these systems is high, they do not process the sound playback in real-time and therefore are not advantageous for use in desktop computer systems. The present invention operates in real-time to process the selected audio file for playback.
In some prior art systems that manipulate audio data, the playback speed of the stored audio data changes which causes perceptual problems and the audio data may not be understood by a listener. In many cases, playback speed is changed by doubling the rate that the audio information is presented to the user. These manipulations alter the duration of the playback sound. A side-effect of this kind of manipulation is a pitch change in the resulting playback sound. This pitch change is often referred to as a "chipmunk" effect because of the resultant high pitch sound of the playback voices when playback at high speeds. The playback data loses affect, gender information and is generally less intelligible than the original recording. This is a problem because playback audio data that cannot be understood is useless. What is needed in order to preserve this audio information during playback is a system to scale the resulting sound back to its original pitch while allowing for rapid playback rates for scanning purposes. The present invention provides for such functions.
Therefore, it is an object of the present invention to provide an efficient apparatus and method to speed-up and slow-down the playback rate of previously recorded audio data in a computer system environment without altering the playback pitch of that data. It is also an object of the present invention to provide an efficient apparatus and method to speed-up and slow-down the playback rate of previously recorded audio data in a computer system environment without altering the intelligibility of the playback data and eliminating undesirable "clicks and pops" in the playback. It is another object of the present invention to provide an efficient apparatus and method to speed-up and slow-down the playback rate of previously recorded audio data in real-time.
It is an object of the present invention to provide these functions on a desktop computer system without the need for specialized hardware. It is an object of the present invention to provide such functionality in an easy to use or "user-friendly" interface of the computer system. These objects and others not expressly stated will become clear as the present invention is expanded in the detailed description of the present invention.
3. Related U.S. Patent Application
The present application relates to a co-pending application concurrently filed with the present application and entitled, "Recording Method and Apparatus and Audio Data User Interface" invented by Leo Degen, S. Joy Mountford, and Richard Mander, Ser. No. 07/951,579, filed on Sep. 25, 1992, and assigned to the assignee of the present application. The above referenced patent application is herein incorporated by reference.