1. Field of Art
The present invention generally relates to the field of digital video, and more specifically, to methods of training accurate classifiers for inferring a location depicted in a video.
2. Background of the Invention
Video hosting services, such as YOUTUBE™, have become an increasingly popular way of sharing and viewing digital videos, with users contributing tens of millions of videos each year. Accurate labeling of a video is of great value in such systems, permitting users to search for videos corresponding to given labels, and the video hosting service to more accurately match videos with relevant advertising, and the like.
One property with which a video can be labeled is the location that the video depicts, such as broad area like a city, a state, or a country, or a specific area like a particular school, a business, a park, or the like. The ability to accurately label a video with the location represented in the video (hereinafter also referred to simply as the video's location) would have numerous benefits for both users of the video hosting service and for the video hosting service itself.
However, automatic identification of the geographic location in a video is challenging, and conventional systems have thus far been confined to identifying locations of simpler types of media, such as images, that are less complex to analyze. Videos often have lower resolution than images, and are thus less able to be recognized using visual features alone. A further difficulty inherent in identifying the geographic locations of both videos and images is the visual similarity of different locations. For example, distinct urban areas, beaches, deserts, and the like tend to have very similar visual features which make them difficult to distinguish solely from their appearances.