One existing method for discovering and indexing video in a networked environment, such as, for example, the Internet, another network, or a combination of networks, is a manual-based approach. Using the manual-based approach, a small number of top video sites is targeted. By manually exploring each of the small number of top video sites, one can learn uniform resource locator (URL) patterns of video play pages of respective ones of the small number of top video sites. Within the small number of top video sites the learned URL patterns of the video play pages may be used, such that any URL, within the small number of top video sites, that matches a learned a URL pattern for a video play page of a respective one of the small number of top video sites is considered to be a video page. Websites, or domains, corresponding to each of the small number of top video sites, may be manually explored to learn corresponding LinkPage patterns of links to downloadable video. Deep crawling of the small number of top video sites may then be performed by following a corresponding one of the LinkPage patterns for each of the small number of top video sites. A debugging tool may then be used to monitor browser/server communications in order to reverse engineer a process for generating a video link. A video may then be downloaded and indexed.
The above-mentioned method for discovering and indexing video works well for a small selected number of sites. However, the above-mentioned method is not scalable to a large number of sites, such as, for example, 100,000 or more sites. For example, when a change in a target site has been made, learning new URL patterns of video play pages, learning new LinkPage patterns of links to downloadable video, and reverse engineering a new process for generating a video link makes maintenance burdensome.