There has been an increasing proliferation of multimedia data, such as video data (e.g. from surveillance cameras). This data can be stored in large data storage structures or disbursed over multiple data storage structures and can be difficult to sort through to find relevant data in these types of large multimedia archives.
Because the relevant data is vastly outnumbered by irrelevant data, naive, searching over the multimedia archive can be extremely inefficient. In order to avoid re-processing the video data each time the archive is queried, systems usually extract a set of features representing interesting aspects (human activity, objects, etc.) of the multimedia file. These features may, for example, be divided into classes for files with common aspects and stored in an index, which provides fast search for near neighbors of a given query feature.
However, nearest neighbor-type searching often fails to provide matching accuracy when features do not cleanly separate these classes. In order to improve accuracy, people often use discriminative classifiers (i.e., Support Vector Machines (SVM)) to probabilistically label features with respect to the classes. Unfortunately, such classifiers often times do not easily lend themselves to indexing, and applying them to large datasets is slow.