1. Field of the Invention
This invention relates in general to video transmission systems performed by computers, and in particular, to marking scene changes in video streams.
2. Description of Related Art
Various applications process video streams or video feeds which are presented to users for different applications. For example, a newsroom may receive video streams from a satellite or other transmission device relating to a recent news event or an Internet user may request a video stream or story board of a sports event. These video streams or files must be processed and presented to the user as quickly as possible since they relate to recent news events and/or must be transmitted over a network quickly to minimize the waiting time assumed by the user.
Before the video stream or file can be presented to a user, however, the video streams are processed such that certain frames that represent the most relevant information of the video stream are selected. These frames are often coined “scene change” frames as compared to other frames which may portray negligible content differences from previous frames. In this context, a scene change occurs when the content of a first frame of the video stream changes sufficiently in a second frame of the video stream such that the second frame triggers a new view relative to the first frame. In order to generate the requested video streams or files, the video streams are processed and analyzed to identify and select scene change frames such that the frames ultimately presented to the user contain the most relevant information.
Examples of applications using scene change analysis to select frames include newsroom videos and Internet files. In the context of newsrooms or news editors and producers, video streams relating to recent news stories may be received from a satellite, a live feed, or a video tape in analog or digital video format. These video streams are analyzed to identify the scene change frames, and these frames are selected and compiled into, for example, a video clip. As these streams may relate to recent news items, this processing and selection must be completed as quickly as possible to insure that the resulting video file is played when the news story is still significant.
Similarly, in the context of the Internet, users may request video files or a storyboard which invokes an extraction tool to select frames of a video file. A storyboard is a collection of images or a collection of thumbnails (i.e., smaller images representing scenes from a video file). An extraction tool, such as a thumbnail extraction tool, may be used to create the storyboard.
More specifically, the Internet is a collection of computer networks that exchanges information via Transmission Control Protocol/Internet Protocol (“TCP/IP”). The Internet computer network consists of many Internet networks, each of which is a single network that uses the TCP/IP protocol. Via its networks, the Internet computer network enables many users in different locations to access information (e.g., video streams) stored in data sources in different locations.
The World Wide Web (i.e., the “WWW” or the “Web”) is a hypertext information and communication system used on the Internet computer network with data communications operating according to a client/server model. Typically, a Web client computer will request data stored in data sources from a Web server computer, at which Web server software resides. The Web server software interacts with an interface connected to, for example, a Database Management System (“DBMS”), which is connected to the data sources. These computer programs residing at the Web server computer will retrieve and transmit the data, including video data, to the client computer. Many video streams are transformed into video files that follow digital video compression standards and file formats developed by the Motion Pictures Experts Group (MPEG). These are referred to as MPEG files and are typically files corresponding to movies. Furthermore, there are various video file formats, including MPEG-1, MPEG-2, and MPEG-4 which produce video files at different resolutions.
Some users request storyboards that are comprised of frames of a video file. When a storyboard is to be generated from a video stream or video clip, an application calls a thumbnail extraction tool to conduct scene change analysis and determine which frames of an MPEG file should be selected as part of the storyboard, i.e., which frames were selected from a video stream as scene change frames. Scene change analysis in this context involves comparing a first frame of an MPEG file to a second frame of the MPEG file, etc. for each pair of frames. Frames representing scene changes are selected by the thumbnail extraction tool based on different factors (e.g., the degree of pan, scan, zoom, etc.) and these selected video stream frames are compiled into a video file. Each of these frames may be an image or “thumbnail” in the storyboard. An MPEG file may include thousands or tens of thousands of frames. Since the storyboard frames must be selected and presented quickly such that the user can select the frames shortly after choosing to generate a storyboard, the analysis and selection of the frames to include within the storyboard must be done as quickly as possible.
Thus, it is clear that video streams must be processed as quickly as possible whether the video stream will ultimately become a news clip, part of an MPEG file or storyboard, or used within some other application or file. In processing video streams or video files, conventional systems process frames twice to determine which frames of the video stream to include within a particular video file. First, when the video stream is initially encoded, frames are processed, for example, to add closed captioning. Second, when an application requests, for example, a storyboard, an extraction tool processes the frames to determine which frames will be selected to create the storyboard. The extra time required to process the frames a second time is assumed by the user. The problem is even more troublesome when there are multiple requests for a video file to create different storyboards. For example, if five different applications request storyboards based on different criteria, a thumbnail extraction tool must perforin the scene change analysis five separate times to determine which frames to include within the storyboard for each application.
As illustrated by these simple examples, conventional systems do not access scene change data in real time or near real time, and thus, are inefficient. Consequently, the cost to process video streams is substantially increased. These shortcomings are amplified when scene change analysis is performed manually or by a slower, more complicated system. In addition, if a system is configured to recognize finer changes between scenes, substantially more time may be required to perform the scene change analysis since these more detailed analyses may involve more complicated calculations.
Thus, there is a need in the art providing scene change analysis to extraction tools in real time or near real time for different video stream applications.