1. Field of the Invention
The present invention relates generally to systems and methods for conducting computer based text searching. The present invention relates more specifically to systems and methods for carrying out a text search from a server computer system on data and file information located on a client computer system.
2. Background Information
The Internet comprises a vast number of computers and computer networks that are interconnected through communication links. The interconnected computers exchange information using various services such as electronic mail, Gopher, FTP, and the World Wide Web. All of these technologies require a level of knowledge that is greater than that possessed by the average Internet user who might want to share information. Additionally, the Internet is in a constant state of fluctuation. FTP servers and web sites come and go. A person searching for a particular file using a web browser and any of the popular search engines (Lycos(copyright), Yahoo(copyright), Alta Vista(copyright), etc.) can expect to have mixed to poor results because of such factors as stale links, ratio or account requirements on FTP sites, as well as unknown bandwidth availability on a given site.
Typically when a user wishes to share some of his files, he designs a web site and/or acquires and sets up an FTP server. Either of these tasks requires more expertise than the average Internet user possesses, therefore, much of what could be shared on the Internet is not.
When a user decides to look for a file on the Internet, he will typically use his web browser to contact a search engine. Since major search engines face the daunting task of trying to index every single web page and/or FTP site, the information they return will necessarily be aged and incomplete. Often a search of FTP servers will yield the location of a file and whether the FTP server will be online. The owner of the FTP site, however, will typically have further requirements, such as a user account, or he may require users to upload files before he will allow the user to download anything.
What is needed is an efficient method of creating a dynamic and constantly updated index of that information available on the Internet so that when a person conducts a search and locates information, the person knows that the information is immediately available.
In view of the above, the present invention is advantageous in that it provides a dynamic and constantly updated searchable index of information that is available on the Internet. To accomplish this, the disclosed invention provides a suffix array search system that allows the rapid searching of large amounts of information from large numbers of users while minimizing the required amount of bandwidth and minimizing the amount of utilized server system resources. The result is that the present invention enables a person searching the Internet to quickly locate and transfer available information.
The present invention may be summarized as a system and method for conducting a full text search on a client system by creating a full text search index of a string of characters on a client system for use on a server system. When a client system signs on to a server system, the client""s system searches for relevant data and file information about that data which the user is willing to share and creates an original string of characters that contains file information such as file name, location, and size. The original string of characters is transformed using the Burrows-Wheeler transformation method. In the transformation, a rotation matrix is created of the original string of characters and the last column of the matrix is compressed using a standard compression method before being transmitted to the server system. The server system decompresses the data using the same standard decompression method. The transformation of the file information is reversed to recover the original string of characters. While recovering the original string of characters, a suffix array is created. The original string of characters and suffix array are stored in the memory of the server system. A binary search can be conducted of the suffix array to efficiently locate any sub-string of characters within the original string of characters.
A second client system signing on to the server system can initiate a search of the memory of the server system for a selected sub-string of characters. Once the selected sub-string of characters is found, the server system sends the second client system a list of the located relevant information (filename, location, size, user IP, user port, etc.). If the second user wants to obtain a copy of the data, a message is sent directly between the second client system and the first client system without the server system being involved unless the first client system is behind a firewall. When the first client system is behind a firewall, the request for the file is relayed through the server system. The requested data will then be transferred from the first client system to the second client system.
Each client system willing to share data that is signed on to the server system will have the original string of characters and the suffix array created for the client system and stored in the server system memory only while the client system is signed on to the server system. As soon as the client system signs off the server system, that client system""s original string of characters and suffix array are deleted from the server system. This creation of a client system""s original string of characters and suffix array only while the client system is signed on the server system enables the server system to have a dynamic and constantly updated index of data, which is available for transfer between client systems.
Other objects and advantages of the present invention will become apparent from the following description of the preferred embodiment with reference to the drawings.