1. Field of the Invention
The present and related inventions generally concern (i) the machine-automated distribution, processing and network communication of streaming digital video/hypervideo, particularly upon digital networks having network content providers (nominally an "Internet Content Provider", or "ICP"), network service providers (nominally an "Internet Service Provider", or "ISP"), and network client subscribers/users/viewers ("client SUVs"). The present and related inventions also generally concern the provision of diverse sophisticated responses--including branching, storage, playback/replay, subscriber/user-specific responses, and contests--to SUV "click-throughs" on hyperlinks embedded within streaming digital hypervideo.
The present invention itself generally concerns the receipt of, the client subscriber-user-viewer ("client SUV") interaction with, and the machine processing of, streaming digital video and hypervideo.
The present invention particularly concerns receiving (upon digital communications network), decompressing, and playing back interactive video, also known as hypervideo, in real time, including by making manifest to the Subscriber/User/Viewer ("SUV") all available imbedded hypervideo links.
The present invention further particularly concerns following in real time any and all hyperlinks acted upon--normally by "clicking through" with a computer mouse--by the SUV so as to (i) make responses and/or (ii) retrieve further information, which further information may include the receipt, decompression and playing of further streaming digital video, and hypervideo.
The present invention still further particularly concerns (i) caching of digital video and hypervideo including hyperlinks, (ii) detecting scene changes, (iii) generating scene "keyframes", or thumbnail images, (iv) displaying detected scene changes, and (v) retrospectively initiating the recording of, and/or initiating, potentially retrospectively, (vi) the playing back of, and/or (vii) hyperlinking from, and/or (viii) recording of, digital video/hypervideo, either from a current playback position or from the start of any stored scene.
The present invention still further particularly concerns recording and archiving streaming digital video and hypervideo.
2. Description of the Prior Art
2.1. Introduction to the Theory of Hypervideo
There is no requirement to read the present section 2.1--which section is based on the early investigations and research into hypervideo of Sawhney, et al., as transpired at MIT (reference cited below)--in order to understand the function, and, at a crude level, the purpose(s) of the present invention. However, hypervideo is, as of the present time (1998) very new, and few people have experienced it. The present section may accordingly beneficially be read in order to gain a "feel" for hypervideo.
More fundamentally, the present section discusses the considerable power of hypervideo, and ends with a discussion of the empowerment that hypervideo provides to a subscriber/user/viewer. The present and related inventions, although they can be narrowly thought of as mere systems and methods for delivering lowly commercials in the hypervideo environment, are really totally consistent with the more profound, and the more ennobling, purposes of hypervideo. Therefore the present section may also beneficially be read to understand to what purposes--both good and ill--hypervideo may be put, and as background to how the present and related inventions serve these purposes.
In recent years Sawhney, et al., at MIT (reference cited below) have developed an experimental hypermedia prototype called "HyperCafe" as an illustration of a general hypervideo system. This program places the user in a virtual cafe, composed primarily of digital video clips of actors involved in fictional conversations in the cafe; HyperCafe allows the user to follow different conversations, and offers dynamic opportunities of interaction via temporal, spatio-temporal and textual links to present alternative narratives. Textual elements are also present in the form of explanatory text, contradictory subtitles, and intruding narratives. Based on their work with HyperCafe, Sawhney, et al. have been leaders in discussing the necessary components and a framework for hypervideo structures, along with the underlying aesthetic considerations. The following discussion is drawn entirely from their work.
"Hypervideo" can be defined as "digital video and hypertext, offering to its user and author the richness of multiple narratives, even multiple means of structuring narrative (or non-narrative), combining digital video with a polyvocal, linked text." Hypervideo brings the hypertext link to digital video. See Sawhney, Nitin, David Balcom, Ian Smith "HyperCafe: Narrative and Aesthetic Properties of Hypervideo." Proceedings of the Seventh ACM Conference on Hypertext. New York: Association of Computing Machinery, 1996.
An even earlier approach to hypermedia was proposed by George Landow, in which he offered rules for hypermedia authors, rules that took into account hypermedia's derivation from print media and technologies of writing. Landow proposed that hypermedia "authors" learn which aspects of writing applied to the emerging hypermedium, and which traits or characteristics needed redefinition and rethinking. He noted: "To communicate effectively, hypermedia authors must make use of a range of techniques, suited to their medium, that will enable the reader to process the information presented by this new technology." See Landow, George P. "The Rhetoric of Hypermedia: Some Rules for Authors." Journal of Computing in Higher Education, 1 (1989), pp. 39-64; reprinted in Hypermedia and Literary Studies, ed. by Paul Delany and George P. Landow, Cambridge, Mass.: MIT Press, 1991.
Hypervideo has its roots in both hypertext and film. As a result, hypervideo embodies properties of each field, but wholly can be placed in neither, for hypervideo is not strictly linear motion picture, nor is it strictly hypertext. This convergence known as hypervideo comments on each discipline, on their similarities, and on their differences. Hypervideo is potentially nonlinear, like hypertext, but displays moving images, like film. Hypervideo can signify through montage, like film, and can generate multiple dictions, like hypertext. Properties of each medium are present in hypervideo. These properties take on new forms and practices in hypervideo.
Hypervideo relocates narrative film and video from a linear, fixed environment to one of multivocality; narrative sequences (video clips followed by other video clips) need not subscribe to linearity. Instead of creating a passive viewing subject, hypervideo asks its user to be an agent actively involved in creation of text through choice and interaction. Hypervideo can potentially change viewing subject from a passive consumer of the text to an active agent who participates in the text, and indeed, is engaged in constructing the text.
Just as hypertext necessitated a re-reading of the act of reading and writing, hypervideo asks for a re-viewing of narrative film and film making and practices of viewing a film. Hypervideo redefines the viewing subject by breaking the frame of the passive screen. Hypervideo users are participants in the creation of text, as hypertext readers are.
Research is presently (circa 1997) projected to determine just how users of hypervideo systems navigate, interact with, and experience hypervideo-texts. Just as J. Yellowlees Douglas has exhaustively researched hypertext readers and the act of hypertext reading, similar projects are expected to be undertaken by hypervideo researchers. See Douglas, J. Yellowlees. "Understanding the Act of Reading: the WOE Beginner's Guide to Dissection." Writing on the Edge, 2.2. University of California at Davis, Spring 1991, pp. 112-125. See also Douglas, J. Yellowlees. "`How Do I Stop This Thing?`: Closure and Indeterminacy in Interactive Narratives." Hyper/Text/Theory, ed. by George P. Landow. Baltimore: The Johns Hopkins University Press, 1994.
Hypervideo is related to film. Hypervideo has the potential to reveal important associations present in a film, and the constructedness of linear filmic narratives, and to this end, would be a beneficial tool for use with film studies education and research. Hypervideo can make available, by way of link opportunities, the different associations and allusions present in a filmic work. These associations are made manifest with hypervideo, made available for the student (or teacher) to see and explore. Relationships between different films can then be tracked, linked, commented on, revealed.
Hypervideo engages the same idea of "processing" that hypertext writing does: in writing hypertext, one makes available the process of writing, representing it visually (in the form of the web the writer builds), rhetorically (in the linking structure of the work, the points of arrival and departure present in the text)--and so one makes apparent the tensions and lines of force present in the act of writing, and the creation or reification of narrative. "Writing" hypervideo does the same for image-making--that is, makes clear the notion of constructing images and narrative. In the case of hypervideo, "narrative" refers to narrative film making. Just as hypertext has within it the potential to reveal the constructedness of linear writing, and to challenge that structure, hypervideo does the same for narrative film making--while also offering the possibilities for creating rich hypervideo texts, or videotexts.
How does narrative film function in hypervideo? Narrative film is necessarily re-contextualized as part of a network of visual elements, rather than a stand-alone filmic device. Because narrative segments can be encountered out of sequence and (original) context, even strictly linear video clips are given nonlinear properties.
Sergei Eisenstein pioneered the concept and use of montage in film. Hypervideo reveals and foregrounds this use. Eisenstein proposed that a juxtaposition of disparate images through editing formed an idea in the viewer's head. It was Eisenstein's belief that an idea-image, or thesis, when juxtaposed through editing, with another, disparate image, or antithesis, produced a synthesis in the viewing subject's mind. In other words, synthesis existed not on film as idea-image, but was a literal product of images to form a separate image-idea that existing solely for the viewer.
Eisenstein deliberately opposed himself to continuity editing, seeking out and exploiting what Hollywood could call "discontinuities." He staged, shot, and cut his films for the maximum collision from shot to shot, sequence to sequence, since he believed that only through being force to synthesize such conflicts does the viewer participate in a dialectical process. Eisenstein sought to make the collisions and conflicts not only perceptual but also emotional and intellectual." See Bordwell, David and Kristin Thompson. Film Art: An Introduction. Fourth Edition. New York: McGraw-Hill, Inc., 1993.
Hypervideo potentially reveals this thesis/antithesis dialectic, by allowing the user to choose an image-idea (in this case, a video clip), and juxtaposing it with another image-idea (another video clip). Hypervideo allows the user to act on discontinuities and collisions, to engage with colliding subtexts and threads.
The user selects a video clip from a black canvas of three or four clips. Each clip lies motionless on the canvas. The user drags a clip onto another one, and they both start to play. Voices emerge and collide, and once-separate image-ideas now play concurrently, with one image extending the frame of the other. The user is left to determine the relationship between the two (or three or four) video clips.
Such video intersections recall Jim Rosenberg's notion of simultaneities, or the "literal layering on top of one another of language elements." See Rosenberg, Jim. "Navigating Nowhere/Hypertex Infrawhere." ECHT 94, ACM SIGLINK Newsletter. December 1994, pp. 6-19. Instead of language elements, video intersections represent the layering of visual elements, or more specifically, visual elements in motion. This is not to say that words, in the case of Rosenberg's Intergrams, are not visual elements; on the contrary, they are. In fact, their image-ness is conveyed with much more clarity (and even urgency) than are non-simultaneous words, or words without an apparent visual significance (save the "transparent" practice of seeing "through" letter-images into words into sentences into concepts). Once the word-images have to contend with their neighbor-layers for foreground screen space, their role in both the practice of signification (where meaning is contingent on what neighborly 0's and 1's are NOT), and as elements of a user interface (words that yield to the touch or click or wave of the mouse) become immediate and obvious. Nor is this to say that video clips aren't "language elements"; on the contrary, they are. The hypervideo clip is caught, as are words and letters, in the act of signification and relational meaning-making (. . . what neighborly 0's and 1's are not . . . ), mutable to the very touch of user, to the layers above and below.
The hypervideo author can structure these video intersections in such a way that only X and Y clips can be seen together, or X and Z if Y has already been seen (like Storyspace's guard fields), and so on, and the author can decide if a third video should appear upon the juxtaposition of X and Y. For example, Video X is dragged onto Video Y and they both start to play. The author can make a choice to then show video Z as a product, or synthesis, of the juxtaposition of Videos X and Y, that reflects or reveals the relationship between Videos X and Y. This literal revealing of Eisenstein's synthesis is made possible with hypervideo. Of course, no synthesis need be literally revealed; that can be left up to the viewer. While the interactions are structured by the hypervideo author or authors (as Eisenstein structured the placement and editing of thesis and antithesis idea-images), the meaning-making is left up to the hypervideo user. His or her choice reveals meaning to him with each video intersection; meaning in the system is neither fixed nor pre-determined. This empowering principle of hypertext is also a property of hypervideo.
2.2. MPEG Standards
2.2.1. Overview
The present invention will be seen to involve computer systems and computer processes dealing with compressed video, and hypervideo, digital data. The video digital data compression may be accomplished in accordance with many known techniques and standards, and is only optionally in accordance with the MPEG family of standards. One short part of the explanation of the invention within this specification will show the operation of the system of the invention in the recording of video that is, by way of example, MPEG-compressed. Accordingly, some slight background as to the MPEG standard is useful, and is provided in this and the following three sections.
The Motion Picture Experts Group--MPEG--is a joint committee of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEG). The first MPEG Standard, known as MPEG-1, was introduced by this committee in 1991. Both video and audio standards were set, with the video standard built around the Standard Image Format (SIF) of 352.times.240 at 30 frame per second. MPEG data rates are variable, although MPEG-1 was designed to provide VHS video quality at a data rate of 1.2 megabits per second, or 150 KB/sec.
The MPEG-2 standard, adopted in the Spring of 1994, is a broadcast standard specifying 720.times.480 playback at 60 fields per second at data rates ranging from 500 KB/sec to over 2 Megabytes (MB) per second.
The expanded name of the MPEG-1 standard is "Coding of Moving Pictures and Associated Audio for Digital Storage Media". The standard covers compression of moving pictures and synchronized audio signals for storage on, and real-time delivery from, CD-ROM.
The sponsoring body is ISO/IEC JTC1/SC29 WG11 (also know as the Moving Pictures Expert Group). The standard is set forth in ISO/IEC 11172:1993 Information technology--Coding of moving pictures and associated audio for digital storage media up to about 1.5 Mbit/s.
Characteristics and description of the MPEG-1 standard is as follows. A typical interlaced (PAL) TV image has 576 by 720 pixels of picture information, a picture speed of 25 frames per second and requires data to be delivered at around 140 Mbit/s. Computer systems typically use even higher quality images, up to 640 by 800 pixels, each with up to 24 bits of color information, and so require up to 12 Mbits per frame, or over 300 Mbit/s. CDs, and other optical storage devices, can only be guaranteed to deliver data at speeds of around 1.5 Mbit/s so high compression ratios are required to store full screen moving images on optical devices.
The MPEG-1 standard is intended to allow data from non-interlaced video formats having approximately 288 by 352 pixels and picture rates of between 24 and 30 Hz to be displayed directly from a CD-ROM or similar optical storage device, or from magnetic storage medium, including tape. It is designed to provide a digital equivalent of the popular VHS video tape recording format.
High compression rates are not achievable using standard, intraframe, compression algorithms. MPEG-1 utilizes block-based motion compensation techniques to provide interframe compression. This involves the use of three types of frame encoding: 1) intra coded I-Pictures are coded without reference to other pictures; 2) predictive coded P-Pictures are coded using motion compensation prediction based on preceding I-Pictures or P-Pictures; and 3) bidirectionally-predictive coded B-Pictures use both past and future I-Pictures and B-Pictures as their reference points for motion compensation.
While B-Pictures provide the highest level of compression they cannot be interpreted until the next I-Picture or P-Picture has been processed to provide the required reference points. This means that frame buffering is required for intermediate B-Pictures. The amount of frame buffering likely to be available at the receiver, the speed at which the intermediate frames can be processed, and the degree of motion within the picture therefore control the level of compression that can be achieved.
MPEG-1 uses a block-based discrete coding transform (DCT) method with visually weighted quantification and run length encoding for video compression. MPEG-1 audio signals can be encoded in single channel, dual channel (two independent signals), stereo or joint stereo formats using pulse code modulation (PCM) signals sampled at 32, 44.1 or 48 kHz. A psychoacoustic model is used to control audio signals sent for quantification and coding.
2.2.2. How MPEG Works
Like most video compression schemes, MPEG uses both interframe and intraframe compression to achieve its target data rate. Interframe compression is compression achieved between frames, through, essentially, eliminating redundant interframe information. The classic case is the "talking head" shot such as with a news anchor, where the background remains stable and movement primarily relates to minor face and shoulder movements. Interframe compression techniques store the background information once, and then retain only the data required to describe the minor changes--facial movements, for example--occurring between the frames.
Intraframe compression is compression achieved by eliminating redundant information from within a frame, without reference to other video frames. MPEG uses the Discrete Cosign Transform algorithm, or DCT, as its intraframe compression engine. By and large, however, most of MPEG's power come from interframe, rather than intraframe compression.
MPEG uses three kinds of frames during the compression process: 1) Intra, or I frames; 2) Predicted, or P frames; and 3) Bi-directional interpolated, or B frames. Most MPEG encoding schemes use a twelve- to fifteen-frame sequence called a group of pictures, or GOP.
I frames start every GOP, and serve as a reference for the first two B frames and first P frame. Since the quality of the entire GOP depends upon the quality of its initial I frame, compression is usually very limited in the I frame.
P frames refer back to the immediately preceding P or I frame, whichever is closer. For example, P frame 4 could refer back to I frame 1, and P frame 7 referring back to frame 4. During the encoding process, frame 4 searches frame 1 for redundancies, where the data about which are essentially discarded. Regions in frame 4 that have changed since frame 1--called "change regions"--are compressed using MPEG's intraframe compression engine, DCT. This combination of interframe and intraframe compression typically generates a higher degree of compression than that achieved with I frames.
MPEG uses yet another compression strategy: B frames refer backwards and forwards to the immediately preceding or succeeding P or I frame. For example, for frame 11, a B frame, the compression scheme would search for redundant information in P frame 10 and the next I frame; once again, redundant information is discarded and change regions are compressed using DCT. The double-dose of interframe compression typically generates the highest compression of the three frame types.
All three types of encoders use the same basic GOP scheme defined in the MPEG specification. From a pure compression standpoint, the schemes differ in two key ways: their relative ability to identify interframe redundancies and whether they can modify GOP placement and order to maximize compressed video quality.
2.3 Practical Problems With Hypervideo
The concept is clear that, once hyperlinks can be inserted into streaming digital video (as they efficiently are by the related inventions) so as to make hypervideo, then a Subscriber/User/Viewer ("SUV") of the streaming digital hypervideo can, by volitionally exercising the hyperlinks--normally by "clicking through" on a link with a computer mouse--control, to some extent, the progress of (streaming digital hypervideo) viewing. However, a great number of questions and problems are immediately presented.
How long should the hyperlinks "last", meaning that they are susceptible of being acted upon? If the hyperlinks persist for a long time, or indefinitely, then they will likely offer insufficiently flexible, and diverse, branching capability in video presentations that are, as is common, fast-moving and diverse. Consider, for example, a newscast where, within only a few seconds or tens of seconds devoted to a single story, the SUV might beneficially be offered the capability to "click through" to diverse further information (ranging from commercials to education). If, however, the hyperlinks are as transitory as the accompanying (hyper)video, the SUV would have to have to make up his or her mind with astounding rapidity, and then "pounce like a cat" in order to reasonably "click through" on hyperlinks of interest. The present invention will shortly be seen to deal with this dilemma by holding past hyperlinks available, in the full context of their initial presentation, for a generously long time. It will be possible, for example, to exercise hyperlinks associated with (hyper)video scenes that have already previously gone by some minutes ago.
Next, what should happen upon a SUV "clicking through" on a hyperlink? Should The SUV'S computer be branched to a new, linked, (hyper)video "feed", only? If so, then how can the SUV return to where he/she left off, or even--a momentary diversion having been exercised--to the "present" progress (whatever the word "present" means in hyperspace, and it doesn't mean much) of the (hyper)video "feed" that the SUV previously left off from? Or, alternatively, should a parallel separate screen--equivalent to the present capability of networked computers to institute multiple running copies of a network browser be instituted? And if so, then how many times does this go on? Until computer memory no longer permits (as is the case with internet browsers)? Or until the video bandwidth is so fragmented that not all, or even no, "screens" update smoothly and properly?
The present invention will shortly be seen to permit the SUV to keep visible two simultaneous (hyper)video feeds--differently displayed--not primarily because of hardware limitations, but mostly because the "richness" of hypervideo (as compared to, for example, the linear linking of URL's with an internet browser) is likely to cause the SUV to "overload", and become confused as to where and when he/she is in viewing when indefinitely many scenes and options are simultaneously presented.
Next, what, if any, features--other than such simple branching as is analogous to branching between web sites with an internet network browser--should be accorded a SUV for "clicking through" on a hyperlink.
The present invention will shortly be seen to contemplate, and to fully enable, amazingly diverse, and versatile, responses to a SUV click on an appropriate hyperlink. If the SUV simply wants to see or hear something on the network, or somewhere in cyberspace, then "clicking through" on a hyperlink may be considered a "natural" response. However, suppose the SUV wants to make/register a response--anything from providing the simplest feedback by clicking a mouse button to selectively moving the cursor and clicking the mouse button to answer the hardest question to typing a response--for any of diverse purposes ranging from (i) a simple request for printed/mailed/e-mailed information to (ii) the ordering of goods to (iii) the entry of contests and lotteries?
It will shortly be seen that, in accordance with the present invention, each separate SUV (of potential thousands, and tens of thousands under a single Internet Service Provider, or ISP) will enjoy (i) unique hyperlinks (ii) uniquely resolved (called "dynamic hyperlink resolution"). In short, the hyperlinks offered each SUV are absolutely anything that the offeror--normally (i) an advertiser or (ii) a service agency or (iii) a network store or (iv) a contest administrator variously located (in accordance with the related invention) at many points along a long network path to the user--says that they are! If some entity gives an SUV a hyperlink to click upon to (i) see a car or (ii) receive information on a car or (iii) buy a car or (iv) win a car, then the SUV need only exercise the hyperlink to (i) see, (ii) learn about, (iii) buy, or (iv) win, the car.