1. Technical Field
Embodiments of the present disclosure generally relate to a search engine support system and, in particular, to a method and apparatus for indexing rich Internet content using contextual information.
2. Description of the Related Art
The rapid proliferation of multimedia content (e.g., user interactive controls and application generated controls that create an exciting and interesting multimedia experience) throughout the Internet was caused by numerous technological innovations. Accordingly, such multimedia content may be referred to as rich Internet content with which users spend a significant amount of time conducting various activities (e.g., surfing educational websites, viewing detailed product demonstrations, accessing digital libraries and participating in expert discussion forums related to the multimedia content). Rich Internet content includes video, audio, text, animation, and combinations thereof. These users often download and view the rich Internet content on various display devices (e.g., a mobile phone, an electronic book reader, a Personal Digital Assistant (PDA), a hand-held gaming device and/or the like from various Internet resources (e.g., web pages, multimedia clips and/or content, emails and/or the like).
Rich Internet Applications (RIAs) provide interactive functionality for the rich Internet content. Consequently, RIAs have become a very popular multimedia tool on websites throughout the Internet. An RIA typically is a collection of rich Internet content that is wrapped within programming code to be executed by a playback routine. For example, some RIAs may comprise animations, interfaces, games, video clips, audio clips, and/or other interactive or passive content (referred to herein as “rich Internet content”). In addition, an RIA typically includes program code to instruct a playback routine (referred to as an “RIA Player”) regarding how to display and progress through the content of the RIA. One such RIA Player is a FLASH player (from Adobe Systems Incorporated) that executes an RIA in the form of a SWF file to present rich Internet content to a viewer. A SWF file format is defined by the SWF File Format specification (version 10) as published, at http:www.adobe.com/devnet/swf/pdf/swf_file_format_v10.pdf, by Adobe Systems Incorporated of San Jose, Calif.
Internet search engines desire to index the rich Internet content for the purpose of enabling Internet users to locate and access the rich Internet content (e.g., web page menus, SWF files and/or the like) from home computers and/or mobile devices. For example, activating certain items (e.g., buttons, links and/or the like) of a web page menu generates a web effect or loads text and/or video. As such, indexing applications can traverse the RIAs to index rich Internet content. Often, during the indexing process, the rich Internet content is restarted numerous times.
Currently, the rich Internet content is loaded, advanced for certain duration (i.e., a stabilization time) until a stable point is reached (i.e., a state). The state is scanned for any interactive entities (e.g., buttons or sprites) for which events have been registered, which are identified as new transitions. The indexing application then chooses a particular transition and generates an event on the corresponding interactive entity by simulating a selection (e.g., a mouse click, touch screen contact, and the like). The rich Internet content is advanced to arrive at a new state. The list of transitions taken from the initial state to the current state is called a transition path. The indexing application must stop at the current state and restart the rich Internet content to choose a different transition path if, for example, the current state has no transitions or, if a depth at which the current state is present is beyond a pre-configured value where the depth is a value that refers to a number of transitions taken from the initial state to reach the current state. Such a process is repeated until each and every state is visited and there are no new transitions or when a predefined time limit has been reached.
Current indexing applications (e.g., Adobe FLASH Search) are solutions for search engines to extract data from within rich Internet content (e.g., FLASH content) by dynamically traversing through the states. These applications implement various traversal techniques in a module (e.g., a Virtual User Module (VUM)) that simulates actions typically performed by a user when browsing through the content. Current solutions compare a display list of the currently reached state with the states that have already been traversed. If both are found to be same, then the current state is marked as previously traversed earlier and the current solutions continue with the next transition. Such indexing techniques can lead to looping within an RIA (e.g., ping-ponging between states and/or repeating list traversals).
Currently, searching of rich Internet content involves following different transition paths and then, simulating events on all of the interactive entities. This requires triggering multiple events and may also require restarting the rich Internet content and taking different paths. The problem with the current approach is that the selection of interactive entities for generating the events, and comparison of states to identify if a state has been visited already is performed in an ad-hoc manner. A significant amount of computing power and time is spent restarting and in triggering events. To reduce these inefficiencies, it is necessary to avoid redundancy by determining whether a transition path will result in a state that has been previously followed and/or if a newly reached start has already been traversed.
Therefore, there is a need in the art for a method and apparatus for indexing rich Internet content.