A conventional search engine, such as Altavista (http://www.altavista.com), Lycos (http://www.lycos.com) or Yahoo (http://www.yahoo.com), generally includes a database for classifying, storing and managing web site information based on a predetermined rule, a search robot, embodied as software, for constantly traveling over the web and automatically collecting new web site information, and search engine software for storing the collected data in a database and allowing a user of the search engine to search for desired information in the database.
FIG. 1a is a block diagram showing an entire system for providing the search engine service. As shown in FIG. 1a, a user connects to a search engine server 150 over the Internet via a user terminal 110. If the user enters search terms, a search engine server 150 queries search engine software 140 about web site information corresponding to the entered search terms, and the search engine software 140 searches a database 130 to notify the user of retrieved web site information. A search robot 120 is an entity embodied as software for constantly traveling over the web and automatically collecting new web site information from a web server 160, as described above. The search robot 120 searches for HTML (Hypertext Markup Language) documents on a network and parses links described in the HTML documents and then collects data from a number of web sites existing on the network. The data collected by the search robot 120 is databased. The term “databased” refers to a series of processes of performing morphological analysis of information located on a web site and producing a corresponding index table and storing it in the database 130. The database 130 is provided to store all web site information collected by the search robot 120. The search engine software 140 functions to show search results to users. This software searches a large number of pages stored in the database 130 and lists search results by relevance to the search term. The conventional search engine as described above registers information about a web site in a search engine and provides the information to users in the following ways.
(1) Information of a web site is collected using the search robot as described above, and the web site information is registered in the search engine after being reviewed by expert surfers.
(2) A category corresponding to the subject of a web site to be registered is selected from a directory of categories classified by subject, and it is requested that the web site be registered in the selected category, and then the web site is registered in the search engine after being reviewed by expert surfers. Some search engines provide a fee-based directory registration service to reduce the time required to register a web site in their directory with a registration fee.
Web sites registered in the search engine in the above method are provided to a user who is looking for desired information after they are searched for in various ways, such as integrated web search and directory search, based on search terms entered by the user. The integrated web search is also called “word-based search”, in which Universal Resource Locators (URLs) of all web sites are stored in a database and desired information is searched for based on a specific keyword entered by the user. The directory search is also called “subject-based search”, in which web sites are organized into subject-based categories and if a user links to a desired category, the user can view detailed items thereof. In this manner, the subject-based search allows the user to continue to link to the detailed items and retrieve desired information. For example, if a user desires to find Korean team match scores in the 2002 Korea-Japan World Cup, the user can search for them via categories such as Sports→Ball Sports→Soccer→FIFA World Cup→2002 Korea-Japan World Cup→Korean team match scores. FIG. 1b is an example screenshot of the directory search method. As shown in this figure, directory search results with search terms “world cup” are three categories “World Cup”, “2002 FIFA Korea-Japan World Cup” and “History of the World Cup”, and the user can search for desired information by moving to one of the three categories in which the desired information is most likely to be placed. A typical search engine based on the integrated web search method is Lycos (http://lycos.cs.cmu.edu) developed by Michael L. Mauldin at Carnegie-Mellon University, and a typical search engine based on the directory search method is Yahoo (http://www.yahoo.com). Many current search engines provide hybrid search services based on a combination of the different search methods described above.
The conventional method for registering web sites in the search engine and searching for the registered web sites has the following problems.
As the number of Internet users has rapidly increased, the number of users who desire to search for specific information has rapidly increased and the number of types of information for which they desire to search has increased. As the number of such users and the types of such information has increased, some search terms appear very frequently, which will also be referred to as “popular keywords”. This causes a problem in that users, who desire to search for information based on the popular keywords, may receive information of web sites (hereinafter also referred to as “deceptive sites”) that contain contents of no use to the users and insert the popular keywords in their web pages in various ways. For example, if a user enters a popular keyword “Pikachu” to search for information about the Pikachu, information of all registered web sites that contain the word “Pikachu” in their web pages is provided to the user. The web sites provided to the user may include web sites that contain adult or sexual contents and insert the word “Pikachu” in some places in their web pages in various ways (with ill intention in most cases). This popular keyword insertion causes a wide age range of users to be exposed to the information of the web sites that contain adult or sexual contents.
The conventional method for overcoming the problems described above requires complaint reports by users or requires specialists such as expert surfers to constantly monitor the registered web sites, but the conventional method obviously cannot be an ultimate solution to the problems. If an algorithm automatically executed on the Internet to solve the problems can be provided, it will be a useful means to solve the problems all at once.