With the advent of compact disk (CD), compact disk/read only memory (CD/ROM) and video disk (VD) digital storage technologies, computer systems are rapidly becoming capable of interacting with users through a variety of different media, facilitating the presentation to users of high quality sound and visual images at access speeds which are acceptable to the user and pedagogically effective. Software developers no longer must rely solely on textual information, or even on computer graphics and animation generated by the computer's central processing unit (CPU), to describe or simulate real-world events. Their arsenal has recently expanded to include high quality reproductions of actual speech and music, as well as live video.
Despite the promise of these technologies, practical multimedia applications have been somewhat slow in coming. One reason for this is the current "read-only" nature of the technology. Because erasable CD technology is not yet commercially available, today's users do not yet have the ability to modify speech, music or data stored on a CD/ROM, or video and audio stored on a VD, through a random access mechanism similar to that used for hard disk magnetic media.
Yet, a wealth of applications still exist even for purely read-only technologies. For example, large databases of unchanging information (such as encyclopedias, historical records or legal case law) can be maintained utilizing the vast storage capacity of CD/ROM drives. The full text of such information can then be searched, giving the user access to tremendous amounts of information.
In the field of education, students can benefit significantly from rapid access to audio and visual information, in addition to the traditional textbook. Presenting the same information multiple times and in varying formats enhances the learning process. Interactive applications, in particular, are seen by many educators as an excellent supplement to the traditional lecture with crude visual aids, because interaction provides the learner with choice and control, key motivators to the learning process.
Although some multimedia learning tools have been developed recently (for example, by connecting a VD to a computer, and enabling users to switch from their computer application to a related video with the touch of a button), these tools have met with limited success. The ideal of the multimedia information database through which users can choose how they wish to navigate is far from being realized.
A primary reason for the limited success of multimedia applications thus far is the current state of VD technology with respect to the simultaneous use of audio and video. Interactive computer programs have thus far been unable to access large amounts of high quality audio and video simultaneously, much less add computer graphics to the mix.
For educational applications in particular, certain thresholds must be met if the learner (user) is to interact with the system in a meaningful way. Delays of many seconds often cannot be tolerated. Sufficiently large amounts of high-quality audio and video information must be accessible in very short periods of time. For such an interactive system to be feasible, a minimum of 60 minutes of continuous or discrete sounds or utterances (FM-quality or better), and 30 minutes of high-quality video, should be accessible within a maximum access time of 11/2 seconds.
The audio portion of video disks is currently limited to two audio channels per side, each of approximately thirty minutes in duration. More significant than the limited duration is the necessity of synchronizing both audio tracks with the individual visual frames of the video track. In other words, the audio portion of each track must "line up" with the frames of the video track.
The necessity of synchronizing audio and video on a VD in this manner has not generally been problematic in the case of movies or other traditional videos, because there is a direct one-to-one correspondence between the video images and the soundtrack which is meant to be played at the same time such images are displayed. Interactive computer systems, however, are hindered significantly by the necessity of synchronizing the audio and video tracks in this manner.
For example, consider a simple interactive computer system in which the user sees the video images of a person giving a speech, but can select whether the accompanying audio is in English or in a translated language such as Japanese. Such a system might be useful for teaching English to Japanese-speaking persons. The English voice can be synchronized with the individual video images on one of the two VD audio tracks. The Japanese translation, however, cannot easily be synchronized with those same video images, because the Japanese audio portion is of a longer duration than the corresponding English portion. This presents a problem, which occurs in many applications requiring the synchronization of video to multiple audio segments.
One previous solution to such synchronization problems has been to duplicate the video images in several places on the VD, synchronizing each corresponding audio portion to a unique set of video images. This "solution," however, is extremely wasteful of the limited space on a VD, and is thus not feasible for non-trivial applications.
Another seemingly logical solution to this problem is to freeze individual video frames while the audio portion plays. On a VD, however, this technique (known as "Still Frame Audio") results in extremely poor sound quality, approximating that of AM radio, which is inadequate for applications requiring more realistic FM-quality or even CD-quality sound. Moreover, the process of packing and aligning audio into a single video frame is quite complex and significantly more expensive than simply recording audio onto a CD.
It is therefore not surprising that interactive computer systems, by relying primarily on VD technology with its inherent synchronization of audio and video, have thus far been unsuccessful in providing users with access to large amounts of high quality audio and video simultaneously.
Until recently, high quality audio from a CD has only been accessible by "track" (approximately 100 per CD), enabling only limited access, for example, to each song of an album. Yet, with the advent of recent developments in CD technology, it is now possible to access individual sounds or utterances on a CD within a single track, via random access techniques similar to those employed for hard disk magnetic media, and with an access time that is acceptable to the user. No interactive computer system known to the inventors has yet exploited this technology, perhaps because it is so new, or perhaps because the VD is currently the only medium which provides synchronization of high quality audio and video.
Moreover, additional problems remain with today's interactive computer systems, particularly with respect to the goal of giving the user the appearance that the system possesses intelligence in the form of a vast array of information which the user can access at will (requiring that large amounts of high quality audio and video be available at an effective access time of no more than 11/2 seconds).
One such problem relates to the practical requirement that such systems enable the user to concentrate on the information which he or she desires to access. The system should, at the very least, "know" where the information resides, insulating the user from the source of that information (whether CD, VD or the computer's CPU), and from the process of accessing that information from these various sources without having to wait beyond a minimum threshold of time.
In addition, such systems should also be capable of interfacing with users of differing levels of experience and expertise. In the past, interactive computer systems have simply provided an "average" level of expertise, or multiple levels which users can reach by progressing through the lower levels. Even "hypertext" systems, which enable users to access information linked in a non-sequential manner, have thus far not provided the depth of choice necessary to give the system the appearance of intelligence. It is important that users not only have a choice of "where to go next," but also that users are presented with a sufficient variety of information and choices, reflecting various levels of expertise.
Thus, current interactive computer systems have been unable to provide users with access to large amounts of high quality audio and video simultaneously, and have also been unable to provide the appearance of sufficient intelligence to allow the user to focus on the information itself, and remain insulated from the technology and the limited choices of information to which the user has access.