The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware, such as semiconductors and circuit boards, and software, also known as computer programs. As advances in semiconductor processing and computer architecture push the performance of the computer hardware higher, more sophisticated and complex computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
Years ago, computers were isolated devices that did not communicate with each other. But, today computers are often connected in networks, such as the Internet or World Wide Web, and a user at one computer, often called a client, may wish to access information at multiple other computers, often called servers, via a network. Searching is the primary mechanism used to retrieve information from the Internet. Users typically search the web pages of the Internet using a search engine, such as AltaVista, Yahoo, or Google. These search engines index hundreds of millions of web pages and respond to tens of millions of queries every day.
To accomplish this formidable task, search engines typically employ three major elements. The first is an agent, often called a spider, robot, or crawler. The crawler visits a web page, reads it, and then follows links to other pages within the site. The crawler typically returns to the site on a regular basis, such as every month or two, to look for changes. The crawler stores the information it finds in the second part of the search engine, which is the index. Sometimes new pages or changes that the crawler finds may take some time to be added to the index. Thus, a web page may have been “crawled” but not yet “indexed.” Until the web page has been added to the index, the web page is not available to those searching with the search engine. Search engine software is the third part of a search engine. This is the program that interrogates the millions of pages recorded in the pre-created index to find matches to a search and ranks them in order of what the program believes is most popular, which is often referred to as the page rank. Page rank is extremely important to the user because a simple search using common terms may match thousands or even tens of thousands of pages, which would be virtually impossible for the user to individually sort through in an attempt to determine which pages best serves the user's needs.
In order to aid the user, search engines typically determine relevancy by following a set of rules, which are commonly known as an page-ranking algorithm. Exactly how a particular search engine's algorithm works is usually a closely-kept trade secret. But, all major search engines follow the same generally-accepted methods described below. One of the main methods in a page-ranking algorithm involves the location and frequency of keywords on a web page, which is known as the location/frequency method. For example, page-ranking algorithms often assume that terms appearing in a title control-tag are more relevant than terms appearing in other locations in the page. Further, many page-ranking algorithms will also check to see if the search keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. They assume that any page relevant to the topic will mention those words at the beginning. Frequency of terms is the other major factor that page-ranking algorithms use to determine relevancy. The page-ranking algorithm analyzes how often keywords appear in relation to other words in a web page and deems more relevant those with a higher frequency.
In addition to the location/frequency method, which is an on-the-page ranking criteria, search engines also typically make use of off-the-page ranking criteria. Off-the-page factors are those that use data external to the page itself. Chief among these is link analysis. By analyzing how pages link to each other, the page-ranking algorithm attempts to determine both the subject of a page and the relative importance of the page with respect to other pages.
Although link analysis works reasonably well for web pages created in HTML (Hypertext Markup Language), many web pages today contain animated graphics files, which are created, for example, via Macromedia Flash technology. The animated graphics files typically contain a sequences of frames, which when displayed in succession give the appearance of a moving picture. Links or buttons may be embedded in the frames, and crawlers have difficulty finding the embedded links, which makes the off-the-page ranking criteria less valuable than in the HTML case.
In an attempt to address this problem, frame development studios that generate animated graphics files often scan the files for embedded links and embed empty links to the pages in the HTML file that encapsulate the animated graphics files. Unfortunately, while these embedded links help the crawler to jump past the animated graphics files, they do not allow the crawlers to access the data within the animated graphics files themselves. Further, animated graphics files cannot be converted to HTML to allow for crawling because animations encompass the additional dimension of time that cannot be represented in a single HTML document.
Thus, without a better way to provide for the crawling of animated graphics files, search engines will not be able to properly rank search results, which users rely on as a helpful tool for determining relevance.