As used herein, the term multimedia space refers to a dimension set that defines a domain navigable by a user. The multimedia space may have a single dimension (for example, a program timeline running from time start to time end), or many dimensions (for example, a three dimensional map). The multimedia space may be realworld, as in the case of an aircraft flight simulator, or it may be logical, as in the case of an n-dimensional virtual space representation.
An essential aspect of multimedia systems is to provide the user with the ability to traverse a virtual space that is represented with text, still or short-sequence graphics, and some time-linear media (e.g. longer video or audio sequences).
The multimedia space to be navigated is typically data found in one or more media servers, for example a hard disk holding video, text, and/or audio information. Using a suitable interface such as a mouse, joystick or other mechanism, a user can navigate or traverse multimedia space by requesting displays on a video monitor. In a virtual reality application, moving a joystick to the right or left changes the displayed image from scenery on the user's right to scenery on the user's left. The joystick movement in essence requests display of the new scenery. Instead of using a joystick, the user could wear a motion-sensing apparatus such as a glove,
Within a typical multimedia space, there are many dimensions for traversal or navigation, some of which map to time-linear media, and some of which do not. For a multimedia encyclopedia, for example, there is a linear dimension in the alphabetical ordering of items, and there is an X-Y space in the display of any given page. However, these dimensions have no mapping to the time-linear nature of a video or graphical sequence that might be displayed in association with a body of text about a given subject in the encyclopedia. Understandably, a user viewing a multimedia space wants the freedom to navigate along any direction and at any rate he or she chooses. Such freedom of navigation should not burden the user with real-world constraints of the media servers, for example, by having to wait excessively long for the pictures to be displayed.
It must be recognized that the media servers containing the representations of the multimedia space have certain ballistic and linear constraints. Ballistic constraints concern how quickly data associated with a desired image can be moved from the media servers (hard disks, for example) to the system display engine. Video tape recorders, and many video servers, can provide quality video only within a small speed range around normal playback speed. Linearity constraints reflect the inability of a media server to truly accomplish random access. For example, one cannot randomly traverse different segments recorded on a video tape, and even a computer hard disk used for video playback and recording has limitations as to random access. Further, user request of certain recombinations of video material can result in further delays, so that the system can, for example, make video copies for use in special effects or to pre-render video sequences.
Prior art multimedia representation mechanisms perform reasonably well, but only for multimedia spaces in which there exists a mapping of a time-linear media to a navigable dimension. However in the more generalized case where such mapping did not exist, prior art products and mechanisms perform poorly because they force the user interface to be locked synchronously to the media servers.
In practice, these prior art mechanisms lock the sequence of user interface events synchronously to the display of the associated media. Because the display is considered to be a single timeline, it is navigable only at the rate of the slowest mechanism in the display path, typically the video server. A user requests displays (for example by moving a joystick), but once a request enters the request queue, it must be displayed, and a given user request cannot be processed until all preceding requests in the queue are completed.
Because prior art multimedia representation mechanisms employ a serialized, non-cancelable queue, system throughput is degraded if incorrect requests are queued. What would be desirable, for example, in a virtual reality simulator application, is for intelligent software to decide to pre-queue images representing adjacent pages of display, perhaps on the assumption that the user may next wish to see adjacent scenery. Unfortunately because prior art mechanisms do not permit cancelling uncompleted requests, the user would be forced to view such pre-queued scenery, even if the user had "walked" out of the area depicted in the pre-queued scenes. Thus, any efficiency that might be gained by pre-queuing is foregone because it would create an unalterable queue.
Video editors constitute another area of application for multimedia display mechanisms. Video editors are used in video tape productions to combine selected video scenes into a desired sequence, such as special effects seen on television. To produce the sequence, the video editor must accurately frame-synchronize video tape recorders and peripheral devices. A user controls the editor from a keyboard while watching a monitor that displays information from the video editor.
"Off-line" video editing systems are used to review source tapes, and to create simple editing effects such as "cut" and "dissolve". Such editors generate an intermediate work tape whose frames are marked according to an accompanying edit decision list ("EDL") that documents what future video changes are desired.
By contrast, "on-line" editing systems are quite sophisticated, and control a variety of devices to produce complicated video effects. On-line editing systems are used to make post-production changes, including changes based upon the work tape and EDL from an off-line editor. The output from an on-line editing system is a final video master tape and an EDL documenting, at a minimum, the most recent generation of changes made to the master tape.
A conventional EDL is a complex collection of timecode numbers and cryptic designations for keying, dissolve and other video effect operations.. The timecode numbers give the precise time and video frame numbers where events occur on the finished tape, and also provide "in" and "out" times at which a given video source was transferred onto the finished tape. However at best, such conventional EDLs are a one-dimensional historical time record of the most currently issued commands that resulted in changes appearing on the finished video tape.
In video (and audio) editing, the primary space being traversed is the program timeline that is being constructed, namely the EDL, which conventionally is shown graphically or textually as a timeline of edit-events on graphical editing systems. The media of primary interest is the time-linear program being constructed, whose description is the EDL. This is a special case within the general multimedia domain, because there is a direct mapping between the time-linear dimension of the media, and one dimension of the space being traversed, namely the EDL description for the program segment.
Among video editing systems, the interaction between ballistically and linearly limited media servers distinguishes between linear and non-linear edit controllers. As noted, the linear and ballistic characteristics of video tape transports greatly reduce the speed with which a user can move about within the video representation of a program. As a result, application software rarely links the virtual program representation and the media. This constraint interferes substantially with the creative process.
More flexible are non-linear systems, wherein the program timeline is typically locked to the timeline position of a textual or graphical representation of the program. Some commercially available editing products allow a user to navigate a timeline in a limited fashion to produce limited quality video images. Such mechanisms are relatively limited, especially with respect to the quality of the video provided.
Because textual representations of EDLs are difficult to work with, manufacturers of off-line and on-line video editors have sought to embed video within the EDL. One such product is the AVID Media Composer, manufactured by Avid Technology, Inc. of Burlington, Mass. This AVID product is an off-line video editor that embeds low quality video in the editor system.
FIG. 1 depicts the display 1 of an AVID system, wherein a first portion 2 of the screen is a current source window/monitor, a second portion 3 is the current program monitor, and third portion 4 presents a number of still video images that map directly to a displayed edit timeline 6 having discrete points 8. The images in regions 2 and 3 are sequences of images, whereas portion 4 presents various still video images. For a given length timeframe, a fixed number of still images 8 are displayed, interspersed uniformly along the timeframe. By means of a computer mouse, a user can click on a timeline point and display a corresponding video frame.
In the conventional fashion, the AVID system binds the time linearity of the display media to the navigable dimension, namely the program timeline 6. Essentially such systems require that the display media be formatted in the same manner as the navigable dimension, thus creating arbitrary media events (timeline points) for the convenience of the system. In a broad sense, these two parallel timeline are then synchronously locked together.
While the synchronous lock can prevent the navigation from proceeding faster than the display media, so doing can limit navigation. In practice, when the user moves along the virtual dimension at speeds in excess of the media server capability, the AVID system temporarily unlocks the two timelines and simply samples the media timeline, skipping video frames as needed. The sampling occurs as often as the server can handle, or only when the user stops at some point on the virtual timeline.
In AVID-type systems, sampling provides a workable fallback mechanism, only because dimension mapping exists. The application software must issue but one request at a time, and issue the next request only after the first request has completed. When it is time to issue a request, the multimedia dimension reference point is mapped to the equivalent media timeline, and provides the sampling mechanism. Understandably, if no mapping existed, the application software could not pick a "next" picture without possibly leaving certain pictures blank, an unacceptable solution when a set of still pictures is to be displayed. No mapping exists because for still pictures, because such pictures are merely events not sequences. Sampling works for the window mechanisms depicted in FIG. 1 as regions 2 and 3, although sampling can produce a somewhat jerky sequence because some stills in the sequence are omitted from display. For the stills in region 4, each individual picture must be displayed because there is no way to sample.
That the limited AVID system works at all is because of the direct mapping between the time-linear media dimension and the navigable dimension of the multimedia space, i.e., the described program timeline. As noted, but for that mapping, the AVID system would not know how to sample the media timeline.
This capability exists only because the AVID system has the direct mapping noted above. Because the AVID system maps one timeline position to one media position, the availability of the media position suffices. If the timeline position does not correspond to such media position, then by definition the requested timeline position is obsolete and may safely be discarded. Thus AVID-type systems can readily determine what requests are obsolete.
Since the timeline can represent more data events than can be conveniently presented-in a single display window, the AVID system displays one window that may be user scrolled left and right. However because the AVID model must first make a request for a still picture and then wait for that request to be completed, a user scrolling to the next page of timeline must first wait until all outstanding still video requests have been displayed. While this time delay may only be a few seconds, professional users scroll hundreds of times per day and often prefer to simply disable this feature to save time.
Unfortunately for much multimedia there is no such mapping, and prior art systems such as AVID that rely upon the mapping breakdown completely. Assume for example that the user simply wants to display "head" or "tail" frames for events on the timeline, e.g., the first or last frames for a segment. If the user navigates the timeline faster than the media server can retrieve and display head or tail frames, no straightforward way exists to sample the media. The user wants to stop navigation at any point and expects to see all the pictures for the currently displayed region of program timeline. Because there is no mapping, sampling would omit pictures along the way, and leave blank pictures in portions of the display. Furthermore, since there is no linear dimension to sample, it would be difficult to develop a general algorithm that would never overburden the media server.
Commercial products including the AVID system that are available for multimedia display, manipulation, and coordination address this problem forcing traversal of the multimedia space to be tied synchronously to the media servers. This forced synchronization makes such systems unbearably slow, especially for head frame display of a timeline, head and tail frame display of media segment libraries, and the like.
As noted, in practice such systems are so slow that users prefer to turn off the media servers for these applications, sacrificing the additional information that a display might provide to obtain speed of navigation. While this may be an alternative for certain video editing systems, this alternative defeats the whole purpose of a true multimedia system by reducing the system to a mono-media representation of the virtual space.
As an alternative, some prior art multimedia applications require the user to explicitly request the media within the display. Such applications can undesirably require more user interaction than is justified in many applications. Navigation can be slowed because the user is forced to make too many specific decisions. For example, the user may be forced to click on multiple icons before a preliminary decision can be made whether a displayed area is even an area of interest to the user.
Further, because prior art multimedia display mechanisms synchronously couple the media server to the user interface, they lack the ability to pre-queue requests and then cancel queued requests, or to prioritize requests. For example, in a virtual reality display, as the user "walks" slowly past a building it may be desirable to render the building image with detail. However as the user begins to "walk" faster, it would be sufficient to render the building with less detail, and as the user "runs" to render the building with still less detail.
Understandably rendering the building as a wire frame model useful for navigation requires less system resources (for example, video-or graphics memory, computer resources or hard disk access, and thus time) than rendering the building in full detail. Because prior art systems cannot cancel a request, it is not feasible for such systems to issue both fast and slow requests for an object, and then (if desired) cancel whichever request has not completed before it is obsolete. By way of example, this constraint precludes requesting a wire frame image of a building and a fully rendered image for the same building, and then cancelling the fully rendered image if the user navigates past the building before the slower completing image displays. The request to display a wireframe image could be in response to a user joystick motion, or application software could detect that a "walking" velocity had carried the user past the building. Similarly, the cancel request could be in response to the user's walking past the building, thus rendering obsolete all requests to display images of the building.
Clearly the inability to cancel such requests commits system resources longer than required in prior art systems. The result is considerably slowed navigation, or the requirement that all display requests be shifted to low quality. The flexibility to cancel outstanding requests and thus make better use of system resources and to expedite navigation is just not available in the prior art.
In summary, there is a need for an asynchronous media server mechanism that allows the user to navigate a modelled space as quickly as input events are handled. The mechanism should recognize media events or instances as defined by the user, rather than imposing such definitions upon the user. The mechanism should permit cancellation of outstanding requests that have become obsolete as the user navigates out of an area for which the video was needed. Further, the mechanism should permit navigation that is independent of the ballistic and linear constraints of the media servers.
Such mechanism should de-serialize the request queue such that media server performance is adaptively optimized to match the users' actions. The mechanism should be applicable to any media not having have a time-linear nature mapping directly onto some dimension of the modelled space displayed to the user. If applied to cases where such mapping does exist, the mechanism should provide a smooth sequence of high quality images, as contrasted with what is offered by prior art mechanisms. Finally, such mechanism should permit prioritizing requests according to system resources required to complete the request, and application-related priorities.
The present invention discloses such a mechanism.