As computing devices increasingly include cameras, and as a greater quantity of image content is available, searching utilizing an input image, as opposed to input query text, becomes more useful. However, current mechanisms of performing searches of exceptionally large volumes of digital content by computing devices are primarily text-based, with input queries being received in textual form. Where an input image is provided as the query, computer classification mechanisms are utilized to enable computing devices to recognize aspects of the image and convert the input image into textual content and then search utilizing traditional text-based searching. More specifically, trained computer classifiers are utilized to deduce textual content from images. Training classifiers, however, is difficult. The deduction of textual content from images often requires many classifiers, each of which can require tedious training that can quickly increase the scale of the aforementioned training difficulties. Moreover, errors in classification then further propagate through the system since the resulting textual content which forms the very basis of the search is wrong due to a classifier error or mis-classification. Furthermore, there are many instances in which classifiers have difficulty producing accurate results, such as instances where there are many different classifications that could be applied, sometimes referred to in the art as classification problems having large cardinality.