Aerial video analysis is challenging due to many factors, such as moving cameras, view point change, illumination change due to view angle change, and distorted object appearance due to oblique view. A previous system by the inventors used a residual saliency (RS) detection approach, a motion-based moving object detection approach known as MogM, and convolutional neural network (CNN) based classification for object recognition, as described in Literature Reference No. 1 (in the List of Incorporated Cited Literature References) and U.S. application Ser. Nos. 14/205,349 and 13/938,196, all of which are hereby incorporated by reference as though fully set forth herein. RS is based on a bio-inspired attention model and, generally, can detect image areas that are significantly different from their surroundings (i.e., salient areas). For example, RS can usually find high contrast areas as potential objects of interest. However, this approach does not take any advantage if one has knowledge of samples of the objects of interest (OIs).
Thus, a continuing need exists for a system that can better adapt to known object classes that can provide many samples for training and performs well on aerial videos captured with moving platforms (i.e., airplanes).