1. Field
Embodiments described herein generally relate to techniques for accessing web content and, more specifically, to accessing rich Internet content to support content related processes such as indexing of rich internet content.
2. Description of the Related Art
Rich Internet Applications (RIAs) provide interactive functionality for multimedia content, e.g., user interactive controls and application generated controls that creates an exciting and interesting multimedia experience. Consequently, RIAs have become a very popular multimedia presentation tool on websites throughout the Internet.
An RIA typically is a collection of content that is wrapped within programming code to be executed by a playback routine. For example, some RIAs may comprise animations, interfaces, games, video clips, audio clips, and/or other interactive or passive content (referred to herein as “rich Internet content”). In addition, the RIA typically includes program code to instruct a playback routine (referred to as an “RIA Player”) how to display and progress through the content of the RIA. One such RIA Player is a FLASH® player (FLASH is a registered trademark of Adobe Systems Incorporated) that executes an RIA in the form of a SWF file to present rich Internet content to a viewer. Another type of RIA player includes the open-source Gnash software. Other varieties of RIA players include frame-based players and their associated RIAs.
To broadly utilize the content within an RIA, the content needs to be accessible to content related processes such as, for example, content searching and indexing. However, searching and indexing an RIA has been a challenge for the various search engines particularly because of multiple layers of interactive functionality contained in the RIA. According to a conventional technique, search engine crawlers (or other indexing agents) are configured to execute an RIA, triggering various interactive or other functionalities. As the crawler executes the RIA, each functionality event is triggered and the crawler captures the text within a resulting “page”, frame or other textual output to produce an index representing the content presented by the RIA. At various points during the RIA execution, a function may have alternative branches that either automatically branch or branch in response to user input (e.g., help button, game selection, and the like). The crawler exercises one branch, then returns (retraces) to the branch point and exercises the alternate branch; all the while, capturing the textual information produced by the RIA.
In some instances, a branch point or a point along the branch executes a call function that retrieves information from outside of the RIA, e.g., current time, current location or position, a random number, and the like. The RIA may use this information to determine the content to be displayed. When the crawler returns (retraces) to the branch point or branch that contains the call function, the retrieved information in response to the call function may change, thus making the content associated with the branch non-deterministic. Such branching may create erroneous results or cause the crawler to loop indefinitely through the branch point and its potentially infinite number of branches. Any resulting information is inconsistent and, therefore, is not useful for search engine indexing.
Accordingly, there exists a need in the art for a method and apparatus to access rich Internet content and improve content related processes such as indexing of RIA related content.