1. Field of the Invention
The apparatus and methods of this invention relate to the following classes: voice controlled television, electric amusement devices, motion picture and sound synchronizing, videodisc retrieval, digital generating of animated cartoons, and branching motion pictures.
2. Description of the Prior Art
Since the beginning of the motion picture industry, movies have generally been constrained to a predetermined sequence of predetermined scenes. Although a vicarious sense of involvement is often felt by each viewer, the immutability of the sequence of scenes limits the viewer's actual participation to a few primative options such as cheering, commenting, and selecting what to watch. This limitation in prior-art movies has not changed substantially with the advent of television, video games, and audience-response systems.
Although the prior art includes devices capable of providing viewer participation, such devices do not provide all of the following features in one entertainment medium:
(1) vivid motion picture imagery; PA0 (2) lip-synchronized sound; PA0 (3) story lines (plots) which branch (have alternative sequences); PA0 (4) elaborately developed story lines as in motion picture drama; PA0 (5) scene changes responsive to inputs from each individual viewer; PA0 (6) seamless transitions between shots; PA0 (7) many hours of non-repetitive entertainment.
Furthermore no prior art device can conduct a voice dialog with each viewer in which the screen actors respond to the viewer's voice in a natural conversational manner.
Prior-art video game devices enable players to control video images via buttons, knobs, and control sticks. But in these devices the images are limited to one stereotyped scene such as a battlefield, an automobile race, a gun fight, or a surface on which a ball is moved around. Such game devices generate simple moving figures on a television screen, but the elaborate plot, dialog, characterization, and most of the cinematic art is absent.
Another problem faced by the present invention is providing many hours of interactive entertainment without obvious repetition. Prior-art video games can be played for many hours only because they involve ritualistic cycles in their mechanism of play. Such cycles lack the variety, suspense, and realism of conventional movies.
The use of microcomputer-controlled videodiscs for interactive instruction has been discussed in the literature (for instance see "Special Purpose Applications of the Optical Videodisc System", by George C. Kenney, IEEE Transactions on Consumer Electronics, November 1976, pages 327-338). Such computer-assisted instructional devices present conventional movie portions and still frames with narration in response to information entered by the student via push-buttons. But this prior art does not teach how to synchronize multiple alternative motion picture sequences with multiple alternative audio tracks so that spoken words from any of the audio tracks are realistically synchronized with the moving lips of the human actors in the video image. Nor does the prior art teach a method for automatically inserting spoken names of the players into a prerecorded spoken dialog so that lip-synchronization (lip-sync) is maintained. Nor does the prior art teach a method for making a television movie responsive to spoken words from the viewers/players so that an illusion of personal viewer participation results.
Prior art systems for recognizing voice inputs and generating voice responses, such as described in U.S. Pat. No. 4,016,540, do not present a motion picture and therefore cannot simulate a face-to-face conversation.
Prior art voice controlled systems such as described in U.S. Pat. No. 3,601,530, provide control of transmitted TV images of live people, but cannot provide a dialog with pre-recorded images.
Prior-art systems have been used with educational television in which the apparatus switches between two or more channels or picture quadrants depending on the student's answers to questions. Such systems cannot provide the rapid response, precise timing, and smooth transitions which the present invention achieves, because the multi-channel broadcast proceeds in a rigid sequence regardless of the student's choices.
The prior art also includes two-way "participatory television" which enables each subscriber of a cable-TV system to communicate via push-buttons with the broadcaster's central computer so that statistics may be gathered on the aggregate responses of the viewers to broadcast questions and performances. Similar systems use telephone lines to communicate viewer's preferences to the broadcaster's computer. Although the central computer can record each viewer's response, it is not possible for the computer to customize the subsequent picture and sound for every individual viewer. The individual's response is averaged with the responses from many other subscribers. Although such systems permit each person to participate, the participation is not "individualized" in the sense used herein, because the system cannot give each individual a response that is adapted to him alone.
The prior art for synchronizing audio with motion pictures is largely concerned with film and video tape editing. Such devices as described in U.S. Pat. No. 3,721,757, are based on the presumption that most of the editing decisions as to which frames will be synchronized with which portions of the audio have been made prior to the "final cut" or broadcast. If multiple audio tracks are to be mixed and synchronized with a motion picture, such editing typically takes many hours more than the show itself. It is not humanly possible to make the editing decisions for frame-by-frame finecut editing and precise lip-sync dubbing, during the show. For this reason, prior-art editing and synchronizing apparatus (whether preprogrammed or not) cannot provide each individual player with an individualized dialog and story line, and are therefore not suitable for interactive participatory movies and simulated voice conversations which are automatically edited and synchronized by the apparatus during the show.
Another problem not addressed in the prior art is the automatic selection of a portion of audio (from several alternative portions) which may be automatically inserted into predetermined points in the audio signal by the apparatus during the show. For example, the insertion of the names of the players, selected from a catalog of thousands of common names, into a dialog so that the actors not only respond to the players but call them by name. Recording a separate audio track for each of the thousands of names would require an impractically large amount of disc space. But using a catalog of names requires that each name be inserted in several points in the dialog, whenever an actor speaks the name of the then current player. The task of synchronizing audio insertion so that the dialog flows smoothly without gaps or broken rhythm at the splice is one heretofore performed by skilled editors who know in advance of the editing procedure which frames and audio tracks are to be assembled and mixed. In the present apparatus this finecut editing cannot be done until after the show has started, because no human editor can know in advance the name of each player and the sequence of the dialog which will change from performance to performance. The present invention solves these editing and synchronizing problems.
While watching a prior art branching movie as described in U.S. Pat. No. 3,960,380, a viewer cannot talk with the screen actors and have them reply responsively. Applying prior art speech-recognition techniques to control such branching movies would not provide a realistic conversational dialog because of the following problem: If the number of words which a viewer of any age and sex can speak and be understood by the apparatus is sufficiently large to permit a realistic conversation, then prior art speech-recognition techniques are unreliable. But, if the vocabulary is restricted to only a few words to make speech recognition reliable, then a realistic conversation would not result. This problem is resolved in the present invention.