Multimedia, namely, the combination of text, animated graphics, video, and sound, presents information in a way that is more interesting and easier to grasp than text alone. It has been used for education at all levels, job training, and games by the entertainment industry. It is becoming more readily available as the price of personal computers and their accessories declines.
Digital technology's exponential decline in price and increase in capacity has enabled it to overtake analogue technology. The Internet is the breeding ground for multimedia ideas and the delivery vehicle of multimedia objects to a huge audience. The World Wide Web is the Internet's multimedia information retrieval system. In the Web environment, client machines communicate with Web servers using the Hypertext Transfer Protocol (HTTP). The web servers provide users with access to files such as text, graphics, images, sound, video, etc., using a standard page description language known as Hypertext Markup Language (HTML). HTML provides basic document formatting and allows the developer to specify connections known as hyperlinks to other servers and files. In the Internet paradigm, a network path to a server is identified by a Uniform Resource Locator (URL) having a special syntax for defining a network connection. So called web browsers, for example, Netscape Navigator (Netscape Navigator is a registered trademark of Netscape Communications Corporation) or Microsoft Internet Explorer (Internet Explorer is a trademark of Microsoft Corporation), which are applications running on a client machine, enable users to access information by specification of a link via the URL and to navigate between different HTML pages.
Multimedia systems need a delivery system to get the multimedia objects to the user. Magnetic and optical disks were the first media for distribution. The Internet, as well as the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite or Net BIOS on isolated or campus LANs, became the next vehicles for distribution. The rich text and graphics capabilities of the World Wide Web browsers are being augmented with animations, video, and sound. Internet distribution will be augmented by distribution via satellite, wireless, and cable systems.
Nowadays, multimedia generally indicates a rich sensory interface between humans and computers or computer-like devices; an interface that in most cases gives the user control over the pace and sequence of the information. An example of a multimedia application, is movies on demand (also known as video on demand (VOD)), in which a viewer can make selections from a large library of videos and then play, stop, or reposition the tape or change the speed. In more detail movies on demand is a service that provides movies on an individual basis to television sets in people's homes. The movies are stored on a central server (termed a content server in this description) and transmitted through a communication network. A set-top box (STB) connected to the communication network converts the digital information to analogue and inputs it to the TV set. The viewer uses a remote control device to select a movie and manipulate play through start, stop, rewind, and visual fast forward buttons. The capabilities are very similar to renting a video at a store and playing it on a VCR. The service can provide indices to the movies by title, genre, actors, and director. VOD differs from pay per view by providing any of the movies at any time, instead of requiring that all purchasers of a movie watch its broadcast at the same time. However, watching the movie on a TV set attached to a videocassette recorder (VCR) with the same abilities to manipulate the play is not considered multimedia.
Initially, the type of information distributed was primarily in the form of text and graphics. Later, images and stored audio and video files emerged. Typically, these audio and video files are downloaded from a server and stored at the client machine. A “player” then renders the files before they are “played” on the client machine. Advantageously, downloading allows a user to view files of any format, however, downloading can take time if a user's network connection is “slow” and once downloaded, the files may take up space on the user's hard drive (on the client machine).
As an alternative, streamed audio and video (whereby a stream comprises a single type of data—e.g. audio) have become available from both stored and live sources on the Web. Audio and video streaming enables client machines to select and receive audio and video content from servers across the network and to begin hearing and seeing the content as soon as the first few bytes of the stream arrive at the client machine. Therefore, the actual content of the media files remains on the server and the client machine receives the content of the files as “streams” of data. Although streaming solves some of the problems associated with downloading, playback quality becomes dependent on the network connections.
Streaming media requires that data be transmitted from a content server to a client machine at a sustained bit rate that is high enough to maintain continuous and smooth playback at the receiving client machine. Typically, a client machine requests multimedia data and hypermedia (the combination of hypertext and multimedia) data from a content server. The content server is responsible for streaming the data to the client machine for rendering/playing.
In order to manage the streaming and rendering, it is important that the requesting client machine (specifically an “decoder” application, which converts the incoming stream for rendering on the client machine) understands the characteristics of the stream it will receive and play, such as the frame rate or the sample rate etc. This is often achieved during a client-server “handshake” communication, whereby client machines and server machines communicate over a communications network. For example, if a client machine does not have the capability to handle an incoming audio stream (e.g. because it cannot render the number of frames per second or cannot render an audio stream above a certain sample rate), then there is no need to progress the client machine's request.
Currently, the multimedia content that exists comprises different characteristics and in addition the characteristics are defined in different ways by different file formats. This results in a vast amount of varying multimedia content and no common method for indexing it. Furthermore, if characteristics are not readily available to a decoder, it may have to do some analysis to “guess” at them (for example, by analyzing file extensions e.g. “.avi”, “.mov” etc.) or it may have to do some processing to find the characteristics, if they are situated in a different location to that it is expecting.
In one prior art solution from Microsoft Corporation, the source (which can be one of a number of a variety of formats) is compressed and encoded in the ASF (Advanced Streaming Format) file format, which is based on objects. More information can be found in the book “Inside Windows Media” by Microsoft Corporation, 1999, ISBN 0-7897-2225-9.
Three types of ASF objects exist, representing:                A Header object—defines the characteristics of the stream, or sub-streams        A Data object—comprises the digitized data packets of the media streams or sub-streams        An Index object—defines index entries which point to the data packets, in order to synchronize streams to a common timeline, so that video and audio streams are not out of synch for example.        
The ASF header object comprises a “stream properties” object, defining the properties of each stream. The ASF header can be user by the client machine separately from the other two objects, so that the requesting client machine can prepare to process ASF files, for example, to “play” them.
Whilst the ASF format partially solves the problem, many media files exist only in other commonly used formats such as MP3, MPEG, MPEG4, MHEG etc. This means that content owners have little desire to perform media conversions, as this can be time consuming. Furthermore, content owners frequently need to define various characteristics of the stream delivery, for example, whether the streams are to be protected or flowed “in-the-clear” (that is, unprotected, not encrypted etc.).
Thus, there is a need for less time-consuming maintenance for an administrator, especially if the properties of a stream were to change frequently, and for analysis and preparation of pre-authored multimedia content of any kind.