1. Field of the Invention
The present invention relates to the indexing and searching of various files, and in particular, binary files such as executable files including software releases and patches, compressed files including RAR and ZIP files, multimedia files including digital images, mp3 files, and other audio and video files. The present invention also relates to the ranking of such files after they have been indexed and searched, as well as the way to search and retrieve them rapidly and reliably.
2. Description of Related Art
In today's increasingly complex technology world, methods used to index and search Internet content play an important role for many content-rich applications, such as generic Internet search engines or enterprise search engines.
In the context of a search engine, it usually consists of four core components: a spider, a parser or indexer, a query engine, and a Web interface. The spider, also called a robot or a Web crawler, is the heart of a Web-based search engine. It is an autonomous Web client, which automatically makes connections to Web servers and requests Web pages. The response is checked and if the request is successful, the Web pages are fetched and indexed. In the indexing phase, words from textual Web pages are saved along with other information like word locations into the index. The search engine index created is similar in concept to the index of a book. While a book index provides page references for a particular word, a search engine's index contains words along with references to the Web pages that contain those words. Once the keywords are indexed, a query engine can provide search on the Web pages that contains the keywords. Since there are a significant number of Web pages, which contain a particular keyword, it is also necessary to rank these Web pages according to some particular rules, for example, the number of references made from other Web pages. Lastly, a Web interface is used to browse the sorted list of matching Web pages. The design and layout of the Web interfaces is beyond the scope of this document.
In general, text Web search engines perform an incremental scan and analysis of the Web, extract key words, and generate substantial indexes that can be later searched in response to a user's query. Binary Web search engines are more complicated. As an example, image Web search engine is illustrated here. If key words can be manually added based upon the contents for each image, they can be indexed and searched similar to text Web pages. Therefore, image Web search engines are simplified to text based search engines. However, this requires considerable physical work to label images on the Web. Undoubtedly, it is not practical due to an explosively increasing large number of images on the Web.
In order to make the search process automatic, Web search engines typically label images utilizing their file name and alternate texts. However, this generally produces poor results. Most images on the Web do not have a reasonable file name to represent their content. In addition, file names are normally short and many of them have only numbers. They can not describe the content accurately and fairly. This further reduces the accuracy of search results. In addition, due to the international nature of the internet, images that are posted on the Web may be labeled and described in various languages, which further complicates the process of indexing and searching the correct image files.
Most of today's index and search technologies utilize text to realize index and search as described above. In addition, there are a few of them, which target image files, focusing on particular attributes of images, such as color, brightness, pattern of lines, etc. Such approach is generally described in the following references: Chad Carson, et al., Blobworld: A system for region-based image indexing and retrieval, the Third Int. Conf. on Visual Information Systems, June 1999; Anil K. Jain and Aditya Vailaya, Image retrieval using color and shape, Pattern Recognition, 29(8), 1996; and Michael Ortega, et al., Supporting Similarity Queries in MARS, ACM Multimedia 97. Even though this method may be an improvement of the text-only search methodology, it is still rather limiting in terms of the types of binary files it searches, and the parameters it uses to find the desired files. Furthermore, binary Web search engines may need to resolve the downloading issue as well, because binary files can be extremely large. However, few search methods provide the solution to this problem. Therefore, there is a strong need in the art to find an improved index and search method in order to locate and download the desired binary files which are not readily and accurately served using existing methods.