In recent years, the technologies of video data compression, storage, and interactive accessing have converged with network communications technologies, to present exciting prospects for users who seek access to remotely stored multimedia information.
In the area of network communications technologies, particluarly exciting has been the recent prominence of the Internet and its progeny, the World Wide Web. The Internet and the Web have captured the public imagination as the so-called "information superhighway." Accessing information through the Web has become known by the metaphorical term "surfing the Web."
The Internet is not a single network, nor does it have any single owner or controller. Rather, the Internet is an unruly network of networks, a confederation of many different networks, public and private, big and small, whose human operators have agreed to connect to one another. The composite network represented by these networks relies on no single transmission medium. Bi-directional communication can occur via satellite links, fiber-optic trunk lines, phone lines, cable TV wires and local radio links.
To this point the World Wide Web (Web) provided by the Internet has been used in industry predominately as a means of communication, advertisement, and placement of orders. The World Wide Web facilitates user access to information resources by letting people jump from one server to another simply by selecting a highlighted word, picture or icon (a program object representation) about which they want more information--a maneuver known as a "hyperlink". In order to explore the WWW today, the user loads a special navigation program, called a "Web browser" onto his computer.
There are a number of browsers presently in existence and in use. Common examples are NetScape, Mosaic and IBM's Web Explorer. Browsers allow a user of a client to access servers located throughout the world for information which is stored therein. The information is then provided to the client by the server by sending files or data packets to the requesting client from the server's storage resources.
Part of the functionality of a browser is to provide image or video data. Web still image or video information can be provided, through a suitably designed Web page or interface, to a user on a client machine. Still images can also be used as Hypertext-type links, selectable by the user, for invoking other functions. For instance, a user may run a video clip by selecting a still image.
However, video data objects are very large, or, to put it more precisely, the quantity of data per unit time in a real-time viewing of a video data object is large. As a consequence, access by a user to a desired video data object is subject to data throughput constraints. The present state of the art makes it impracticable to provide more than a few tens of seconds of real-time video over the Internet with a response time that will be satisfactory to a user.
Therefore, multimedia and communication systems for providing users with access to video data objects, for browsing, searching, etc., must grapple with the problem of providing video data in a manner which best utilizes the available throughput to provide video data in a form which is most useful to the user.
With this design objective in mind, let us now consider the state of the art in the technologies of video data compression, storage, and interactive accessing. Recent work has been done to make video material more available and usable over the Web. For instance, an article in the August 1995 issue of ADVANCED IMAGING, by Amy T. Incremona, titled "Automatically Transcribing and Condensing Video: New Technology is Born", describes a method for providing video having an accompanying textual index, such as audio narration or closed caption text. Still images are presented, along with a transcription of audio text that accompanies the images (illustration on page 60). This information is provided in HTML format. Thus, a user can take advantage of the temporal correspondence between video shots and narration or closed caption text. To find a desired point in the video corresponding with a known point in the text, the user performs a key word search for the known point in the text. The result of this key word search is that the desired point in the video is reached.
Additionally in Shahraray et al., "Automatic Generation of Pictorial Transcripts of Video Programs", SPIE Vol. 2417, pp. 512-518, there is described an automatic authoring system for the generation of pictorial transcripts of video programs which are accompanied by closed caption information. The system employs a table having a series of rows, each row containing a pointer to a location of an image, and another pointer to the beginning of a text segment related to the image. A viewing window for a GUI display is shown in FIG. 4 of Shahraray et al., and reproduced herein in simplified form as FIG. 1 of the present patent application. FIG. 1 shows a video image 2, a closed caption text subtitle area 4, and a basic user control area 6. The basic user controls include a "Seek" slider 8.
Accordingly, the state of the art allows for user access to video information based on associated text. However, a more general method for accessing video, not provided by the prior art, would sever the tie between video images and accompanying audio narration or closed caption text.