Precise detection of advertisements (ads) in a video (TV) stream is of paramount importance for companies in the field of TV analytics and measurement, partly because it allows for accurate downstream analysis. Whether the task is to provide audience engagement, deeper insights into consumer behavior and attribution, or to solidify automated content recognition and categorization, accurate and automated ad detection is a very important first step.
Recent performance in activity recognition on the DeepMind Kinetics human action video dataset (Kinetics dataset) demonstrates that high accuracy (80%) at low computational cost can be achieved with 3D convolutional neural networks (CNNs) such as ResNet, Res3D, ArtNet, and others. Spatial features are extracted from individual frames in a temporal neighborhood efficiently with a 2D convolutional architecture. A 3D network then extracts the temporal context between these frames and can improve significantly over the belief obtained from individual frames, especially for complex long-term activities. Recently, variants of 3D CNNs have maintained the highest positions on the activity recognition leaderboard (www.actionrecognition.net).
The present invention uses these techniques for a completely new purpose, namely, for advertisement (ad) detection in a video stream, and more specifically, to perform ad vs non-ad classification.