As portable electronic devices have proliferated and become increasingly powerful and capable, the features for which they are commonly used have shifted. As pocket-sized devices have transitioned from being purely communication devices, to becoming content-consumption devices, to becoming content-creation devices, users have also transitioned towards becoming prodigious content-creators. It is estimated that ten percent of all photographs ever captured were taken in 2012. Similar creation rates apply to video footage. The advent of head-mounted video capture devices such as the Go-Pro camera and Google Glass is accelerating video captured in the general field of view of users. Unfortunately, this glut of image capture has not raised the quality of the created content. Particularly with video footage, the time required to inspect, process, edit, and/or export clips of interest is proportional to the amount of footage recorded. Thus, if the amount of captured footage increases, the amount of time required to extract worthwhile content increases in a roughly linear fashion.
For all disclosures and claims within the present application, a “media image” is defined as at least one of a video image and a still image.
With any type of media images, a typical goal for a content creator is to produce desirable content for a specific audience. The definition of “desirable” may change based on the audience. With specific regard to video images, one method or set of criteria for selecting and editing video images may be appropriate for one audience, but not for another. Furthermore, images that are captured close in time to other images may be desirable for different reasons. These various incarnations of desirability and relevancy may be referred to simply as “saliency.”
A media image may be considered salient for any number of reasons: it may contain a notable event, it may include a particular friend or relative, it may contain an occurrence that others consider interesting in social media outlets, it may have been captured at a particular location, and/or it may contain emotions that a user wishes to capture. It is assumed that the addition of eye tracking to other sensors allows a user a level of analysis and control during this process that would not be available without the advent of eye tracking.
Careful consideration is required when discussing the scope intended by the word “editing.” In typical photo and video applications, “editing” typically connotes manipulation of images, or, in the case of video, also includes the process of rearranging trimmed images into a more desirable order. “Editing” many times excludes the steps of selecting or tagging images on which further steps will be performed, even though those steps should formally be considered part of the editing process. However, for purposes of the disclosure and claims within the present application, “editing” shall include the selecting and tagging steps. Furthermore, in the era before digital media creation, all editing (including selecting and tagging) necessarily occurred considerably after the time of capture. However, features are now included in video and still cameras that allow for the editing process to occur immediately after the time of capture, or “in-camera.” The disclosure herein describes how the process of editing may shift to include times during or even before capture. However, it has not been practically feasible to do so until the systems and methods described herein are implemented.
Unfortunately, for many users, the time commitment required to convert as-captured video images into consumable finished video is a terminal impediment to the process. There are two common outcomes after encountering this impediment. The first is that the entire process is abandoned, and no video images are ever shared with the audience. The second common outcome is that all editing is eschewed and images of extremely low quality and relevance are shared with the audience. Neither of these outcomes is desirable, both for the creator and for the audience. For the creator, this may reduce his or her willingness to record video, knowing that it is too difficult to edit it to a presentable form. For the consumer, watching bad video images provides them with negative reinforcement and may prevent them from wanting to watch video images in the future.
As technology advances, the form factor of the devices a user may carry to create content has shifted, as well. Content-creation devices used to be devoid of other technology. Then smartphones and tablets became capable of capturing video, ushering in an era of miniaturization that was previously unimaginable. Now, head-mounted displays are starting to become feasible as consumer devices, marking a shift in wearable technology that allows it to create content instead of merely logging data from sensors or otherwise. Further, contact lenses and artificial retina are viable enhancements to the human visual system. The systems and methods herein are applicable to these modes of capturing video, tracking eye direction, and editing salient video as well, and are considered part of the present invention. As the requisite technology for determining a user's gaze through eye tracking can now be incorporated into wearable and implanted devices, the eyes become a feasible tool for device input and editing.
Applicant(s) believe(s) that the material incorporated above is “non-essential” in accordance with 37 CFR 1.57, because it is referred to for purposes of indicating the background of the invention or illustrating the state of the art. However, if the Examiner believes that any of the above-incorporated material constitutes “essential material” within the meaning of 37 CFR 1.57(c)(1)-(3), applicant(s) will amend the specification to expressly recite the essential material that is incorporated by reference as allowed by the applicable rules.