Using a computer to analyze and discern the meaning of the content of digital media assets, known as semantic understanding, is an important field for enabling the creation of an enriched user experience with these digital assets. One type of semantic understanding in the digital imaging realm is the analysis that leads to identifying the type of event that the user has captured such as a birthday party, a baseball game, a concert, and many other types of events where images are captured. Typically, events such as these are recognized using a probabilistic graphic model that is learned using a set of training images to permit the computation of the probability that a newly analyzed image is of a certain event type. An example of this type of model is found in the published article of L.-J. Li and L. Fei-Fei, What, where and who? Classifying events by scene and object recognition, Proceedings of ICCV, 2007.
An aerial image co-located with an image taken on the ground by personal cameras contains complementary information which can be used as additional semantic information for event recognition. The main advantage is that aerial images are free of distraction and clutter which is often an adverse factor for computer vision algorithms. It is helpful to recognize the environment of an image (using aerial images) for event recognition. Incorporating inference from aerial images for event recognition requires fusion of information from the two modalities.