The rapid advancement and progression of internet technologies has enabled users to upload and post audio/video content online or stream live audio/video feeds at a dizzying pace. Social networks and other web-based services have traditionally attempted to classify such audio/video for a variety of reasons. For example, such services may wish to tag audio/video content based on objects detected in video content, faces recognized, locations detected, etc., in order to index the same. Similarly, a service may want to recognize, flag, and/or remove prohibited content, such as copyrighted material, offensive or pornographic material, hate speech, and other violations of the service's terms of service. As the volume of audio/video content uploaded and/or streamed by users increases, however, organizing and/or classifying such content may become increasingly challenging.
Conventional audio/video classification systems typically require application of various machine-learning-based classifiers, with each classifier configured to detect a specific classifiable feature such as object detection, face recognition, music identification, etc. Each classifier may require video data in a specific format, such as resolution, aspect ratio, etc. Unfortunately, conventional classification systems typically perform separate decode operations for each discrete classifier, resulting in the performance of multiple decode operations. This may, in turn, add significant processing overhead, making real-time (or even near real-time) classification a near impossibility. In addition, when classifiers are updated, conventional classification systems may be unable to gracefully phase out older classifiers as additional audio/video streams are input into the classification system. For example, classifier updates may necessitate waiting for all current classification operations to finish and/or pausing new classification operations, which may result in unfavorable operational delays.