Mobile devices with access to the Internet and the World Wide Web have become increasingly common, serving as personal Internet-surfing concierges that provide users with access to ever increasing amounts of data while on the go.
Mobile devices do not currently provide a platform that is conducive for some types of searching, in particular searching video content without expending the resources to record and send the recording of the search subject as a query.
Some search applications for mobile devices support photographs taken with a camera built into the mobile device as a visual query, which is called capture-to-search. In capture-to-search, typically a picture is first snapped. Then that snapshot is submitted as the query to search for a match in various vertical domains. Other search applications support audio recorded from a microphone built into the mobile device as an audio query. For example, INTONOW allows users to record audio for use as a query. However, that sound is recorded for a period of up to, about 12 seconds. Then that sound recording is submitted as a query to search for a match in various vertical domains. This process does not work well if the recording conditions are noisy or in the case of a video without sound such that the recording is silent.
Some search engines for audio files use an even longer recording time. However, typical audio search engines do not search for audio in combination with video, and they still require that the actual recording be submitted as the query.
Yet other search applications support video images taken with a camera built into the mobile device as a visual query, which can be called video capture-to-search. VIDEOSURF is an example of video capture-to-search. In VIDEOSURF, a video image is captured for a period of at least 10 seconds and stored. A user then chooses the discriminative visual content for search, and then that video image clip is submitted as a query to search for a matching video.
Existing mobile video search applications expend significant resources to store a relatively long audio and/or video clip and to send the recorded clip to the search engine. Once the search engine receives the recorded video clip query, the search engine can perform matching based on the clip. The existing methods require a clip of fixed duration e.g., 10 or 12 seconds.
Most research related to video search on mobile devices has focused on compact descriptor design on mobile devices. The most popular way to solve this problem is compressing descriptors through the technology of image coding for near-duplicate video search, which can be classified into three categories according to the type of data modality they rely on: audio-based, video-based, and fusion-based methods. However, most existing approaches to near-duplicate video search predominantly focus on desktop scenarios where the query video is usually a subset of the original video without significant distortion rather than video captured by the mobile device. Moreover, the computational costs and compactness of descriptors are often neglected in the existing approaches because conventional approaches to duplicate video search do not take the aforementioned mobile challenges into account. Conventional approaches to duplicate video search are not suitable for mobile video search.