The specification relates to image processing systems. In particular, the specification relates to a system and method for scene determination and prediction associated with a video including one or more frames.
A system for determining a scene surrounding a road is beneficial in many ways. For example, the system may alert a driver to be cautious for animals crossing the road if the system determines that the driver is driving in a forest scene. However, categorizing a scene from one or more images captured in the scene is affected by a variety of factors such as presence of trees and buildings, traffic information on the road, etc. Even scenes in the same category may have a number of variations. For example, a first forest scene may only include trees crowded along the road; a second forest scene may have sporadic cabin distributed among the trees; and a third forest scene may be absent of trees within a short distance from the road. It is very easy to misclassify a scene captured by the images because of the variations in the scene.
Existing solutions for scene determination have numerous problems. First, the existing solutions only perform spatial analysis to individual images captured in the scene. For example, the existing solutions extract features for spatially distributed objects (e.g., trees) in individual images and determine the scene based on the spatially distributed objects. However, different scenes may include the same objects and it is very easy to misclassify the scenes only based on the spatial information in the individual images. For example, both a forest scene and a suburban scene have a presence of trees and it is difficult to distinguish a forest scene from a suburban scene only based on the detection of trees in individual images.
Second, the existing solutions ignore distribution characteristics of the objects across a plurality of images, which is referred to as time-domain information because the images are captured in different time instances while the driver is driving an automobile. The lack of the time-domain information in existing solutions may reduce the accuracy of the scene determination. For example, if there is an instantaneous variation such as an absence of trees within a short distance in a forest scene, the existing solutions fail to determine that the scene is a forest scene because of the absence of tress within the short distance, even though the distribution characteristics of tree presence across the images still indicate that it is a forest scene.
Third, the existing solutions fail to perform scene prediction for the scene. For example, the existing solutions fail to predict whether the driver will be driving in the same scene or in a different scene in the next 5 minutes.