A typical search engine such as Altavista (http://www.altavista.com), Lycos (http://www.lycos.com) and Yahoo (http://www.yahoo.com) includes a database for classifying, storing and managing common website information according to a predetermined reference, a search robot implemented using software that mechanically collects new website information while consistently traveling the web, and search engine software for collating the collected data into a database so that a user who uses the search engine can search the data.
A block diagram of the system for providing the aforementioned search engine service is illustrated in FIG. 1. Referring to FIG. 1, a user accesses a search engine server 150 through a user terminal 110 over the Internet. If the user inputs a predetermined search word, the search engine server 150 queries website information on the search word using search engine software 140. The search engine software 140 searches a relevant database 130 and informs the search engine server 150 of the predetermined website information. A search robot 120 is an entity that is implemented using software for mechanically collecting new website information from a web server 160 while consistently circulating on the web, as described above. The search robot 120 searches sentences written in HTML (HyperText Markup Language) on the network and parses the link source in which the sentences are written to collect data from a plurality of websites existing on the network.
The data collected by the search robot 120 as such are collated into a database. In this case, databasing refers to a series of sequences wherein a morphological analysis of predetermined information located in a website is performed, an index table is written and then recorded in the database 130. The database 130 records all website information collected by the search robot 120. The search engine software 140 shows search results to a user. This software operates to search numerous pages recorded in the database 130 and provide a list of websites containing character strings consistent with a search word according to an arrangement order determined based on a specific algorithm to a search service user. Such a prior search engine registers information on the website in the search engine and provides the information to the user in the following manner.
(1) As described above, predetermined information is collected by the search robot and the collected information on the website is registered in the search engine through the supervision of an expert surfer.
(2) A directory sorted according to website titles to be registered is selected and a request is made for registration of the website in the selected directory. The website is then registered in the search engine through the supervision of an expert surfer. In case of such registration in the directory, a service for reducing the time required for registration of a website upon payment of a predetermined registration fee is provided depending on the search engine.
A user who wants to search predetermined information inputs a search word and the website registered in the search engine through the above method, etc. is searched in various search modes such as an integrated web search or a directory search and is then provided to the user. The integrated web search is also referred to as a “search by keyword”. This search method refers to a method in which URLs (universal resource locator) of all websites are recorded in a database and desired information is searched through the input of a specific keyword.
The prior method of providing the search service of the website has the following problems.
(1) There may be a case where content included in a website when it is first registered in a search engine is different from that included in the website after being registered in the search engine. For example, there is a problem that although a website contains predetermined content when it is first registered in the search engine, the website may gradually deteriorate as time goes by and thus become a spam site that generates a number of pop-up windows. Such spam sites that generate a number of these pop-up windows usually contain adult content. If a user visits a corresponding URL of the spam site or exits the URL, the spam site generates several to several dozen pop-up windows at the same time. Thus, it causes lots of inconvenience to the user.
(2) Furthermore, most search engine companies request different registration fees for a common website for a common keyword and for an adult website for a keyword related to adult content. This is because the search engine companies bear the burden regarding registration management of such adult websites since there is a high possibility that the adult website will violate the positive law compared to common websites. By making ill use of this fact, a user may register his website that contains common content using a common keyword in a search engine and then modify the HTML source of the original website, which directly provides adult content or is linked to other sites that provide adult content. This site may be defined as a “deteriorated site”. There is a problem in that such deteriorated sites are very difficult to detect without a report from search engine users or intentional search by an expert surfer, etc.
As an alternative for solving the aforementioned problems, registered websites are consistently monitored through reports from users or expert personnel such as an expert surfer. It is, however, evident that this prior method could not be a fundamental solution for the aforementioned problems. Therefore, there is a need for a method wherein those problems are automatically solved through a predetermined algorithm on the Internet.