The Internet and other computing networks make vast quantities of inter-linked hypertext documents and other electronic content available for access all around the globe. A person is able access a specific web page from amongst billions of available web pages, for example, by entering a Uniform Resource Locator (URL) that identifies from where the electronic content associated with the web page can be requested and received.
Electronic content can also be identified using a search engine. Search engines allow a person to enter search terms that are then used by the search engine to identify search results, for example, web pages and other electronic content that contain or is otherwise associated with the search terms. The search engine may provide a user interface as a web page. Search terms that are entered on such a user interface are provided to a server that processes a search and returns search results for display to the person who initiated the search. Search engines perform searches for electronic content using an index that associates electronic content with particular search terms. Using an index facilitates quick identification of search results based on the given search terms. To provide accurate search results, a search engine provider creates and maintains such an index with information that accurately associates search terms with electronic content.
Search engine providers use indexing applications to perform indexing of web pages and other electronic content. The indexing applications associate such content with search terms in the search engine provider's index. An indexing application typically downloads and caches a set of web content and then goes through the set to provide the information used in the search index. An indexing application can identify search terms in some types of content, such as within Hyper-Text-Markup-Language-based (HMTL) documents, by simply identifying text within the documents. Identifying search terms for other types of electronic content, however, has required supplemental search term identification capabilities. An indexing application may encounter electronic content that does not expose content-defining text or in which it is otherwise more difficult to identify search terms. For example, appropriate search terms may be more difficult to identify for a rich Internet application, for example, a .swf file that executes on an Adobe® Flash® Player. To index such content, indexing applications have used supplemental applications to identify appropriate search terms for use in the index.
In various circumstances, a search engine provider may require separation of its primary indexing application that provides its core indexing functionality from a secondary application the provides supplemental indexing capabilities. For example, such separation may facilitate use of supplemental capabilities provided to a search engine provider by a third party. The search engine provider may desire to develop is core, primary indexing application on its own and then supplement those indexing capabilities by implementing additional indexing functionality provided in supplemental indexing applications provided by one or more third parties. As a specific example, a third party may provide end users with a plug-in components for playing specific types of files or other content and may provide the search engine provider with a customized supplemental indexing application that is specially targeted to facilitate indexing of those specific types of content. The third party may be better positioned to update and otherwise provide the supplemental indexing application given its familiarity with the plug-in components. With the separately-provided supplemental indexing application, the search engine provider can use its primary indexing application to identify search terms in other types of content and also use the supplemental indexing application provided by the third party to identify search terms in the specific type of content that plays through the third party's plug-in components.
Existing indexing capabilities do not adequately address search engine providers requirements of performing indexing in a contained, accelerated environment in which Internet access is unavailable or limited. Search engine providers have found it advantageous, for example, to download groups of web pages and other content for indexing in an isolated environment that is not encumbered by repeated access to a network retrieve individual content items. The use of supplemental indexing applications in such environments has created unresolved issues. For example, supplemental indexing applications have encounter errors in attempting to identify terms for rich Internet and other non-text content that requires access to other content that is not available to the supplemental indexing application in the isolated indexing environment. The supplemental indexing application must identify search terms in an execution context that differs from the context in which end users would typically execute the context. For example, while an end user may view a rich Internet application as part of a web page that includes other electronic content, a supplemental indexing application is required to analyze that rich Internet application without access to that other electronic content. For example, an indexing application may provide individual files to a supplemental indexing application without providing external content that is referenced by the provided file. Because the external content is not available, the specified external interactions are not adequately examined.
Accordingly, existing supplemental indexing applications do not adequately identify search terms that account for externally-referenced external content because of the isolated environment in which they are frequently required to execute. As another specific example, a supplemental indexing applications may execute content and identify execution branches, but fail to properly handle content that specify external interactions by invoking external content items or waiting for invocation from such external content. Supplemental indexing applications have encountered runtime errors and/or failed to fully identify appropriate search terms in such circumstances. As another specific example, if a web page includes an HTML file (H1.html) and embeds two .swf files (F1.swf and F2.swf), prior indexing techniques have involved providing F1.swf to the supplemental indexing application in isolation. Supplemental indexing applications have not previously been able to fully provide search terms that account for the interactions of such a .swf with external content, such as interaction with the separate, but related, H1.html, and F2.swf content.