1. Field of the Invention
The present invention relates to a system, method, and data model enabling a number of improved operations involving time-based media. These improvements relate to enhanced uploading, storing, shared viewing, editing, manipulation, and operations involving time-based media. More specifically, the present invention relates to a data model of operational parameters for operating a comprehensive data system for shared networked video, audio, animated graphics and other time-based media.
2. Description of the Related Art
Consumers are shooting more and more personal video using camera phones, webcams, digital cameras, camcorders and other devices, but consumers are typically not skilled videographers nor are they able or willing to learn complex, traditional video editing and processing tools like Apple iMovie or Windows Movie Maker. Nor are most users willing to watch most video “VCR-style”, which is in a steady steam of unedited, undirected, unlabeled video.
Thus consumers are being faced with a problem that will be exacerbated as both the number of videos shot and the length of those videos grows (supported by increased processing speeds, memory and bandwidth in end-user devices such as cell phones and digital cameras) while the usability of editing tools lags behind The result will be more and longer video files whose usability will continue to be limited by the inability to locate and access granular segments of interest within the larger videos in an overall library of videos.
Those skilled in the art should recognize the more generic terminology “time-based media” which encompasses not only video with synchronized audio but also audio alone plus also a range of animated graphical media forms ranging from sequences of still images to what is commonly called ‘cartoons’. All of these forms are addressed herein. The terms video, time-based media, and digitally encoded video with synchronized audio (DEVSA) are used as terms of convenience within this application with the intention to encompass examples of time-based media.
A further detriment to the consumer is that video processing uses a lot of computer power and special hardware often not found on personal computers. Video processing also requires careful hardware and software configuration by the consumer. Consumers need ways to edit video without having to learn new skills, buy new software or hardware, become expert systems administrators or dedicate their computers to video processing for great lengths of time.
Consumers have been limited to editing and sharing video that they could actually get onto their computers, which requires the right kind of hardware to handle their own video, and also requires physical movement of media and encoding if they wish to use video shot by another person or which is taken from stock libraries.
When coupled with the special complexities of digitally encoded video with synchronized audio, the requirement for special hardware, difficult processing and storage demands combine to reverse the common notion of using “free desktop MIPS and GBs” to relieve central servers. Unfortunately, for video review and editing the desktop is just is not enough for most users. The cell phone is certainly not enough, nor is the Personal Digital Assistant (PDA). There is, therefore, a need for an improved method and system for shared viewing and editing of time-based media.
Those with skill in the conventional arts will readily understand that the terms “video” and “time-based media” as used herein are terms of convenience and should be interpreted generally below to mean DEVSA including content in which the original content is graphical.
This application addresses a unique consumer and data model and other systems that involve manipulation of time-based media. As introduced above, those of skill in the art reviewing this application will understand that the detailed discussion below addresses novel methods of receiving, managing, storing, manipulating, and delivering, digitally encoded video with synchronized audio. (Conveniently referred to as “digitally encoded video with synchronized audio” (DEVSA) and more broadly time-based media).
In order to understand the concepts provided by the present, and related inventions, those of skill in the art should understand that DEVSA data is fundamentally distinct from and much more complex than data of those types more commonly known to the public and the broad data processing community and which is conventionally processed by computers such as basic text, numbers, or even photographs, and as a result requires novel techniques and solutions to achieve commercially viable goals (as will be discussed more fully below).
Techniques (editing, revising, compaction, etc.) previously applied to these other forms of data types cannot be reasonably extended due to the complexity of the DEVSA data, and if commonly known forceful extensions are orchestrated they would:                Be ineffective in meeting users' objectives and/or        Be economically infeasible for non-professional users and/or        Make the so-rendered DEVSA data effectively inoperable in a commercially realistic manner.        
Therefore a person skilled in the art of text or photo processing cannot easily extend the techniques that person knows to DEVSA.
What is proposed for the present invention is a new system and method for managing, storing, manipulating, editing, operating with and delivering, etc. DEVSA data. As will be discussed herein the demonstrated state-of-the-art in DEVSA processing suffers from a variety of existing, fundamental detriments associated with known DEVSA data operations. The differences between DEVSA and other data types and the consequences thereof are discussed in the following paragraphs.
This application does not specifically address new techniques for digitally encoding video and/or audio or for decoding DEVSA. There is substantive related art in this area that can provide a basic understanding of the same and those of skill in the electronic arts know these references. Those of skill in the art will understand however that more efficient encoding/decoding to save storage space and to reduce transmission costs only serves to greatly exacerbate the problems of operating on DEVSA and having to re-save revised DEVSA data at each step of an operation.
A distinguishing point about video and, by extension stored DEVSA, is to emphasize that video or stored DEVSA represents an object with four dimensions: X, Y, A-audio, and T-time, whereas photos can be said to have only two dimensions (X, Y) and can be thought of as a single object that has two spatial dimensions but no time dimension. The difficulty in dealing with mere two dimensional photo technology is therefore so fundamentally different as to have no bearing on the present discussion (even more lacking are text art solutions).
Another distinguishing point about stored DEVSA that illustrates its unique difficulty in editing operations is that it extends through time. For example, synchronized (time-based) comments are not easily addressed or edited by subsequent users.
Those with skill in the art should be aware of an obvious example of the detriments presented by this time dependence in that it is common for Internet users to post comments on Web sites about specific news items, text messages, photos or other objects which appear on Web sites. The techniques for doing so are well known to those with skill in the art and are commonly used today. The techniques are straightforward in that the comment is a fixed, single data object and the object commented upon is a fixed, single data object. However the corollaries in the realm of time-based media are not well known and not supported within the current art.
As an illustrative example, consider the fact that a video may extend for five minutes and encompass 7 distinct scenes addressing 7 distinct subjects. If an individual wishes to comment upon scene 5/subject 5, that comment would make no sense if it were tied to the video as a whole. It must be tied only to scene 5 that happens to occur from 3 minutes 22 seconds until 4 minutes 2 seconds into the video.
Since the video is a time-based data object, the comment must also become a time-based data object and be linked within the time space of the specific video to the segment in question. Such time-based comments and such time-dependent linkages are not known or supported within the related arts but are supported within this model.
A stored DEVSA represents an object with four dimensions: X, Y, A, T: large numbers of pixels arranged in a fixed X-Y plane which vary smoothly with T (time) plus A (audio amplitude over time) which also varies smoothly in time in synchrony with the video. For convenience this is often described as a sequence of “frames” (such as 24 frames per second). This is however a fundamentally arbitrary choice (number of “frames” and use of “frame” language) and is a settable parameter at encoding time. In reality the time variance of the pixel's change with time is limited only by the speed of the semiconductors that sense the light.
Before going further it is also important for those of skill in the art to understand the scale of these DEVSA data elements that sets them apart from other text or photo data elements. As a first example, a 10-minute video at 24 “frames” per second would contain 14,400 frames. At 600×800 pixel resolution, 480,000 pixels, one approaches 7 billion pixel representations.
When one adds in the fact that each pixel needs 10- to 20 bits to describe it and the need to simultaneously describe the audio track, there is a clear and an impressive need for an invention that addresses both the complexity of the data and the fact that the DEVSA represents not a fixed, single object but rather a continuous stream of varying objects spread over time whose characteristics can change multiple times within a single video. To date no viable solutions have been provided which are accessible to the typical consumer, other than very basic functions such as storing pre-encoded video files and manipulating these as fixed files.
While one might have imagined that photos and video offer similar technical challenges, the preceding discussion makes it clear again that the detriments in dealing with mere two dimensional photos which are fixed in time are therefore so fundamentally different and less challenging as to have no bearing on the present discussion.
Some additional facts about DEVSA should be well understood by those of skill in the art; and these include:                a. Current decoding technology allows one to select any instant in time within a video and resolve a “snapshot” of that instant, in effect rendering a photo of that instant and to save that rendering in a separate file. As has been shown, for example in surveillance applications, this is a highly valuable adjunctive technology but it completely fails to address the present needs.        b. It is not possible to take a “snapshot” of audio as it is perceived by a person. Those of skill in the electronic and audio-electronic arts recognize that audio data is a one dimensional data type: (amplitude versus time). It is only as amplitude changes with time that it is perceivable by a person. Electronic equipment can measure that amplitude if desired for special reasons.        
The present application, and those related family applications apply to this understanding of DEVSA when the actual video and audio is compressed (as an illustration only) by factors of a thousand or more but remains nonetheless very large files. Due the complex encoding and encodation techniques employed, those files cannot be disrupted or manipulated without a severe risk to the inherent stability of the underlying video and audio content.
The conventional manner in which users edit digitized data, whether numbers, text, graphics, photos, or DEVSA, is to display that data in viewable form, make desired changes to that viewable data directly and then re-save the now-changed data in digitized form.
The phrase above, “make desired changes to that viewable data”, could also be stated as “make desired changes to the manner in which that data is viewed” because what a user “views” changes because the data changes, which is the normative modality. In contrast to this position, the proposed invention changes the viewing of the data without changing the data itself. The distinction is material and fundamental.
In conventional data changes, where storage cost is not an issue to the user, the user can choose to save both the original and the changed version. Some sophisticated commercial software for text and number manipulation can remember a limited number of user-changes and, if requested, display and, if further requested, may undo prior changes.
This latter approach is much less feasible for photos than for text or numbers due to the large size and the extensive encoding required of photo files. It is additionally far less feasible for DEVSA than for photos because the DEVSA files are much larger and because the DEVSA encoding is much more complex and processor intensive than that for photo encoding alone.
In a similar analysis, the processing and storage costs associated with saving multiple old versions of number or text documents is a small burden for a typical current user. However, processing and storing multiple old versions of photos is a substantial burden for typical consumer users today. Most often, consumer users store only single compressed versions of their photos. Ultimately, processing and storing multiple versions of DEVSA is simply not feasible for any but the most sophisticated users even assuming that they have use of suitable editing tools.
As will be discussed, this application proposes new methodologies and systems that address the tremendous conventional challenges of editing heavily encoded digitized media such as DEVSA.
In a parallel problem, known to those with skill in the conventional arts associated with heavily encoded digitized media such as DEVSA, is searching for content by various criteria within large collections of such DEVSA.
Simple examples of searching digitized data include searching through all of one's accumulated emails for the text word “Anthony”. Means to accomplish such a search are conventionally known and straight-forward because text is not heavily encoded and is stored linearly. On the Internet, companies like Google and Yahoo and many others have developed and used a variety of methods to search out such text-based terms (for example “Washington's Monument”). Similarly, number-processing programs follow a related approach in finding instances of a desired number (for example the number “$1,234.56”).
However, when the conventional arts approach digitally encoded graphics or, more challengingly, digitally encoded photos, and even more challengingly, DEVSA, managing problem becomes severe because the object of the search becomes less and less well-defined in terms, (1) a human can explain to a computer, and (2) a computer can understand and use algorithmically. Moreover, the data is ever more deeply encoded as one goes from graphics to photos to DEVSA.
Conventional efforts to employ image recognition techniques for photos and video, and speech recognition techniques for audio and video/audio, require that the digitized date be decoded back to viewable/audible form prior to application such techniques. As will be discussed later, repetitive encoding/decoding with edits introduces substantial risks for graphical, photographic, audio and video data.
As an example of this search challenge, consider the superficially simple graphics search question: “Search the file ‘XYZ graph’ which includes 75 figures and find all the elements which are “ovals”.
If the search is being done with the same software, which created the original file, the search may be possible. However, if the all the user has are images of the figures, the challenges are substantial. To name a few:                1. The user and the computer first have to agree on what “oval” means. Consider the fact that circles are “ovals” with equal major and minor axes.        2. The user and computer have to agree if embedded figures such as pictures or drawings of a dog should be included in the search since the dog's eyes may be “oval”.        3. The user and computer have to agree if “zeros” and/or “O's” are ovals or just text.        
The point is that recognizing shapes gets tricky.
Turning to photos, unless there are metadata names or tags tied to the photo, which explain the content of the photo, determining the content of the photo in a manner susceptible to search is a largely unsolved problem outside of very specialized fields such as police ID photos. Distinguishing a photo of Mt. Hood from one of Mt. Washington by image recognition is extremely difficult.
This application proposes new methods, systems, and techniques to enable and enhance use, editing and searching of DEVSA files via use of novel types of metadata and novel types of user interactions with integrated systems and software. Specifically related to the distinction made above, this application addresses methods, systems and operational networks that provide the ability to change the manner in which users view digitized data, specifically DEVSA without necessarily changing the underlying digitized data.
Those of skill in the art will recognize that there has been a tremendous commercial and research demand to cure the long-felt-problem of data loss where manipulating the underlying DEVSA data in situ. Repetitive encoding and decoding cycles are very likely to introduce accumulating errors with resultant degradation to the quality of the video and audio. Therefore there is strong demand to retain copies of original files in addition to re-encoded files. Since, as stated previously, these are large files even after efficient encoding, economic pressures make it very difficult to keep many copies of the same original videos.
Thus, the related art in video editing and manipulation favors light repetitive encoding which in turn uses lots of storage but requires keeping more and more copies of successive versions of the encoded data to avoid degradation thus requiring even more storage. As a consequence, those of skill in the art will recognize a need to overcome the particular detriments presented by the current solutions to manipulation of time-based media.
As an illustrative example only, those of skill in the art should recognize the below comparison between DEVSA and other somewhat related data types.
The most common data type on computers (originally) was or involved numbers. This problem was well solved in the 1950s on computers and as a material example of this success one can buy a nice calculator today for $9.95 at a local non-specialty store. As another example, both Lotus® and now Excel® software systems now solve most data display problems on the desktop as far as numbers are concerned.
Today the most common data type on computers is text. Text is a one-dimensional array of data: a sequence of characters. That is, the characters have an X component (no Y or other component). All that matters is their sequence. The way in which the characters are displayed is the choice of the user. It could be on an 8×10 inch page, on a scroll, on a ticker-tape, in a circle or a spiral. The format, font type, font size, margins, etc. are all functions added after the fact easily because the text data type has only one dimension and places only one single logical demand on the programmer, that is, to keep the characters in the correct sequence.
More recently a somewhat more complex data type has become popular, photos or images. Photos have two dimensions: X and Y. A photo has a set of pixels arranged in a fixed X-Y plane and the relationship among those pixels does not change. Thus, those of skill in the art will recognize that the photo can be treated as a single object, fixed in time and manipulated accordingly.
While techniques have been developed to allow one to “edit” photos by cropping, brightening, changing tone, etc., those techniques require one to make a new data object, a new “photo” (a newly saved image), in order to store and/or retrieve this changed image. This changed image retains the same restrictions as the original: if one user wants to “edit” the image, the user needs to change the image and re-save it. It turns out that there is little “size”, “space”, or “time” penalty to that approach to photos because, compared to DEVSA, images are relatively small and fixed data objects.
In summary, DEVSA should be understood as a type of data with very different characteristics from data representing numbers, text, photos or other commonly found data types. Recognizing these differences and their impacts is fundamental to the proposed invention. As a consequence, an extension of ideas and techniques that have been applied to those other, substantially less complex data types have no corollary to those conceptions and solutions noted below. The present invention provides a new manner of (and a new solution for) dealing with DEVSA type data that both overcomes the detriments of such data noted above and results in a substantial improvement demonstrated via the present system and method.
The present invention also recognizes the earlier-discussed need for a system or method to manage DEVSA data while providing extremely rapid response to user input without changing the underlying DEVSA data.
What is also needed is a new manner of dealing with DEVSA that overcomes the detriments inherent in such data and that enables immediate and timely response to both initial DEVSA data, and especially that DEVSA data and time-based media in general that is amended-or-updated on a continual or rapidly changing basis.
What is not appreciated by the related art is the fundamental data problem involving DEVSA and current systems for manipulating the same in a consumer responsive manner.
What is also not appreciated by the related art is the need for providing a data model that accommodates (effectively) all present modern needs involving high speed and high volume video data manipulation.
Accordingly, there is a need for an improved system and data model for shared viewing and editing of time-based media.