1. Technical Field
The present invention relates to an improved method, system and program for indexing web page contents, and in particular to an improved method, system and program for providing indexed web page contents to a search engine database. Still more particularly, the present invention relates to a method, system and program for indexing web page contents of each web page requested from the Internet by a user and providing the indexed web page contents to a search engine database.
2. Description of the Related Art
In the prior art, it has been well known that computer systems can be utilized to manage indices of records of databases. Many techniques are known to parse, index, and search databases. However, managing extremely large databases presents special problems.
In recent years, a unique distributed database has emerged in the form of the World-Wide-Web(Web). The database records of the Web are in the form of web pages accessible via the Internet. Here, tens of millions of pages are accessible by anyone having a communications link to the Internet.
The pages are dispersed over millions of different computer systems all over the world. Users of the Internet constantly desire to locate specific pages containing information of interest. However, a current problem with the Web is the lack of ability to search and browse (collectively these activities are referred to as xe2x80x9cnavigatexe2x80x9d) the information in the Web. Searching can be described as looking for the resources that contain particular information of interest, such as a specific set of keywords, while browsing is a less focused xe2x80x9clooking around.xe2x80x9d
Currently, it is impossible to efficiently navigate all of the Web. The amount of information available through web pages and other data available through the Internet grows by vast amounts each day. In an effort to provide a directory to data on the Internet, many search engines have been created whereby a user can search web pages by a keyword, phrase, topic, etc. However, each search engine typically only accesses a directory of web pages that have been previously xe2x80x9ccrawledxe2x80x9d or indexed for that search engine or manually created. The indexing data is typically stored in a database which can be searched in various ways to provide users with locations of web pages which may be relevant to a user""s particular interest.
With the amount of information on the Internet growing exponentially, there is little chance that the vast majority of the information will be effectively indexed with the techniques utilized today by the various search engine sites that rely on centralized computers accessing and indexing the distributed content of the Internet in a limited manner. This, in turn, leads to the current statistics which estimate that only 15-20% of the information available on the Internet is readily accessible via current search engine indexing methods.
However, users access a multitude of web pages that have not been indexed by any search engine. Therefore, it would be desirable to retrieve index data from each web page that a user accesses over the Internet in order to update search engine databases with pages that have not yet been indexed. Further, it would be desirable to free bandwidth typically utilized to crawl for non-indexed pages and shift to utilizing data retrieved from user accesses. In particular, indexing pages retrieved from user accesses would be both more efficient and potentially allow a derivation of the value of particular web pages based on the number of times indexes of a particular page are returned to a search engine from different users. Effectively, by receiving index data created during user accesses, the creation of a usefulness value for web pages within search engines may be determined.
In view of the foregoing, it is therefore an object of the present invention to provide an improved method, system and program for indexing web page contents.
It is another object of the present invention to provide an improved method, system and program for providing web page contents to a search engine database.
It is yet another object of the present invention to provide a method, system and program for indexing web page contents of each web page requested from the Internet by a user and providing the indexed web page contents to a search engine database.
In accordance with the method, system and program of the present invention, in response to each user request for a web page, user access to the web page is provided from a temporary copy of the web page which is stored on a device which accesses the web page and which is accessible to the user. Indexing data is then automatically recorded at that device from the temporarily stored copy of the accessed web page, wherein the indexing data corresponds to contents of the accessed web page. The indexing data is thereafter transmitted from the device to a remote data storage device which provides a search engine database. According to one object of the invention, the indexed data is incorporated into the search engine database, such that previously unknown indexed web page contents are provided to a search engine database in response to a user access of that web page. According to another object of the invention, a statistical count of a number of times that indexing data for a particular web page is provided to a search engine database is maintained.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.