A number of applications would benefit from the automatic analysis of video images, such as the analysis of video images for the purpose of understanding the semantics present in the video content. For example, automated video analysis may support video search services, the automatic organization of personal media collections, the automatic summarization of individual videos, human-computer visual interaction and the like. The demand for automated analysis of video images has increased in recent years as the quantity of video images has correspondingly increased due to, for example, the proliferation of mobile terminals, such as smartphones, that carry or otherwise include relatively high-quality cameras that permit the recording of video images at most any time. Indeed, as a result of the proliferation of mobile terminals that include cameras, another example of an application that would benefit from the automated analysis of video images is the generation of a multi-camera edited video in which the video images of the same scene that have been recorded by multiple cameras, such as by the cameras carried by a plurality of spectators at a concert or sporting event, may be automatically combined once the video images captured by the different cameras are analyzed and reconciled with one another.
Video images include a variety of different types of semantics. For example, video images include semantics that define the video genre (e.g., sports, travel, music, etc.), the presence of a specific object, the presence of a specific face, the location at which the video recording was performed, the scene type (e.g., outdoor versus indoor, cityscape versus landscape, etc.) Another type of semantic that may be present within a video recording is a salient event, such as a specific human action, e.g., running, drinking, falling down, etc.
The detection of salient events in an automated fashion may be useful to support various applications or services, such as an application that reviews video images and detects instances in which a person has fallen. Additionally, the automated detection of salient events would permit summaries of a video, such as summaries of a video of a sporting event, to be generated. For example, the video images captured by a plurality of cameras of the same scene, such as a concert or a sporting event, could be automatically reviewed to detect the salient events, thereby permitting a summary of the video images captured by the plurality of cameras to be constructed. The resulting summaries could include the video images of the salient events with other less important portions of the original video images having been omitted.
However, the automated detection of salient events within a video presents a number of challenges. For example, the detection of salient events generally has a relatively high computational complexity as a result of the visual analysis that is required. The computational complexity is particularly substantial in instances in which the video images recorded by a plurality of cameras of the same event are to be analyzed due to the sizeable amount of content that is to be analyzed. The computational complexity associated with the automated detection of salient events not only increases the time required for salient event detection, but may also limit the hardware platforms that may perform the image analysis. In this regard, the automated detection of salient events within video images may be particularly challenging in instances in which the visual analysis is to be performed by a mobile terminal as a result of their increasing, but still limited computational resources.