Classifiers are statistical models, typically implemented as computer programs executed on computer systems, used to classify real world events based on a set of features of a real world event. A real world event is an instance of any entity or event in the real world. An instance of a person and an instance of a hockey game are both real world events. However, real world events can be works of imagination, such as book of fiction, a fake news story, an abstract painting, or a computer-generated digital image. Each of these events are still instances of their respective types.
Videos are one type of real world event that can be classified based on a set of features. Videos have various features, which can be based on attributes or elements of the video. An attribute is a numerical or qualitative aspect of an event, for example, a digital image can have attributes such as an average pitch, an average luminance, a texture parameter, or the like. An element refers to a sub-part of an event. Elements of a video could include a frame, a sequence of frames or a sound bite.
In classification, statistical models are generated which reflect the probability that an event belongs to a labeled class of events based on its set of features. Events may be labeled according to any system which creates distinct classes of events that can be characterized by a set of features. For video events, classes can be based on the type of event depicted within the video, a person in one or more frames of the video, the genre of the video or style of the video. The statistical models generated in classification identify and apply the features with the strongest discriminative value in the differential determination of classes of events. The discriminative value of a feature is a function of a feature's association with a class and the ability to discriminate members of the class based on the feature.
Features used in video classification are time series features, meaning they are generated and evaluated over a series of time points either sampled from the video or determined continuously for the video. The manipulation and comparison of time series feature data creates several challenges in the classification of videos and other time series events. One problem associated with the representation of features over a series of time points is that features which have strong discriminative value for a class can be found at multiple different time scales of a video or other times-series event. For instance, some features with a strong discriminative value may occur for only a small time interval or scale (e.g. at the millisecond scale) and other features with strong discriminative value may occur over a larger time interval or scale (e.g. at a scale of minutes or the entire duration of time series event). For instance, a maximum value over a small interval of time (e.g. a high sound pitch caused by a scream in a horror movie) may have equal discriminatory value as an average feature value taken over several minutes of a video (e.g. the number of different shots in a video showing a sporting event).
The order of the time series values over time creates additional problems in time series classification. Time series features are typically represented as an ordered vector of values corresponding to features over time or space. While order is important in determining time series features, often features with high discriminatory value for a label can occur in different portions of the video. For instance, adult content is often spliced into other videos at different time points making it more difficult to detect using time series features that are bound to a temporal model.
Other problems in classifying time series events based on time series features are caused by periodicity and sparseness of time series features. Certain features may have discriminative value based on their periodicity or recurrence over semi-regular time intervals. For instance, music videos often include the sound of applause on an audio soundtrack, which thus acts as a recurrent and periodic event that can be used to discriminate these types of videos from other types of videos. Other time series features may be sparse, meaning that the occurrence of the time series feature is sporadic over the video or other time series event and/or occurs over a brief interval of time.