When a user wants to perform a search for a video, the user can search by inputting a string of text (e.g., words) and receive search results that list videos that match the text. However, sometimes a user may see something visually, such as the user may be standing in front of a memorable location and wish to know which show or movie has been filmed in this location. Sometimes, the user may not know the exact name of the place and the user may find it more convenient to perform a search by images. Also, the user may like a style of a show or movie but it may be difficult to describe the style of the show or movie in words. However, the user may be able to take a picture of the style, such as a picture of costumes for a movie in the 1920s, for use in video searching.
Some image searches are based on searching low-level visual features of images, such as using color histograms or the texture of an image to search for similar images with the same color histogram or texture. The design of a search framework using the color histogram or texture requires manually-designed operations that make designing the search framework time-consuming. Also, the low-level features ignore the higher level features, such as describing the objects found in the image (e.g., a person, dog, cat, etc.). Other image search frameworks may detect higher level features, such as objects or faces, in the images and search for similar images with the same objects and faces. However, this image search focuses only on a specific part of the image and may not be appropriate to determine which show or movie may be associated with that particular image.