This application relates to indexing and searching electronic information on a computer network. More specifically, this invention relates to a method and system for creating vertical search engines.
As is known in the art, the Internet is a world-wide interconnected network of devices including computers, servers, gateways, routers and other devices. The World-Wide-Web is collection of servers and other devices on the Internet that support electronic document exchange. Electronic documents on the World-Wide-Web are formatted in special languages called mark-up languages, that support electronic links or xe2x80x9chyperlinksxe2x80x9d to other documents, as well as to graphics, audio, video, animation and other types of electronic content. The mark-up languages include, Hyper Text Markup Language (xe2x80x9cHTMLxe2x80x9d), Extensible Markup Language (xe2x80x9cXMLxe2x80x9d), and many others.
As is known in the art, a xe2x80x9csearch enginexe2x80x9d is a software program that searches documents for specified keywords and returns a list of hyperlinks to the documents where the keywords were found. Although a search engine is really a general class of programs, the term is often used to specifically describe systems like Yahoo, Lycos, Alta Vista, Excite, Google and others that enable users to search for electronic content on the World-Wide-Web.
Typically, a search engine works by sending out a software xe2x80x9cspiderxe2x80x9d to fetch as many electronic documents as possible. Another program, called an xe2x80x9cindexer,xe2x80x9d then reads these documents and creates an index of Uniform Resource Locators (xe2x80x9cURLxe2x80x9d) based on the keywords contained in each document. Each search engine typically uses a distinct proprietary algorithm to create its indices such that meaningful results are returned for each query.
As is known in the art, a xe2x80x9cspiderxe2x80x9d is an automated program that searches the Internet for new World-Wide-Wed documents. An xe2x80x9cindexerxe2x80x9d indexes the corresponding URLs and content-related information in a database, which can be examined for matches by a search engine. Spiders are generally considered to be a type of xe2x80x9cbot,xe2x80x9d or Internet robot and are also called xe2x80x9ccrawlers.xe2x80x9d
Most search engines are also general portal sites. As is known in the art, a xe2x80x9cportalxe2x80x9d is a web-site or service that offers a broad array of resources and services, such as e-mail, forums, chat rooms, search engines, on-line shopping malls, etc. American On-Line (xe2x80x9cAOLxe2x80x9d), the Microsoft Network (xe2x80x9cMSNxe2x80x9d) and others are general portal sites.
However, there are a number of problems associated with search engines that are general portals. One problem is that a general search engine is designed to provide all types of general information to all types of users. A general search engine""s search algorithms are typically designed to xe2x80x9chorizontallyxe2x80x9d search for a breath of information to provide general types of information. This horizontal search approach causes individuals looking for specific information on the World-Wide-Web to look through hundreds, if not thousands, of irrelevant pieces of information to finally locate the information they seek, provided they find it at all.
Another problem is that general search engines often return indexes including a large number of links to information that is not closely related to a search requested by a user. This makes it difficult for a user to locate desired information and often leads to user confusion and user dissatisfaction.
Another problem is that vague search terms used in a general search engine return a large a huge number of results. However, a vague search term may be a term of art or all the user knows. The vague search term may not be vague at all when applied to a specific topic or a specific topic.
There have been attempts to solve some of the problems associated with general search engines using vertical search engines or xe2x80x9cvortals.xe2x80x9d As is known in the art, a vortal is a specific type of search engine that provides information and resources related only to one (or a small number) specific topic. These sites typically contain focused information, such as xe2x80x9cverticalxe2x80x9d or xe2x80x9cin-depthxe2x80x9d information pertinent only to their particular targeted topic of interest. Vortals include information pertinent to a targeted topic of a very small horizontal breath, but a larger depth. Vortals are designed to include xe2x80x9cthexe2x80x9d source of pertinent information on the World-Wide-Web for a xe2x80x9ccommunity of interest.xe2x80x9d
Vortals typically provide news, research and statistics, discussions, newsletters, online tools, and many other services that educate users about a specific topic. Vortals typically use specialized searching algorithms to search and provide only information about a specific topic.
For example, a vortal may be created for people interested in the sport of golf. On a general search engine, if a user typed in a search using the vague keyword xe2x80x9cTigerxe2x80x9d to search for URLs including hyperlinks to information about the golfer Tiger Woods, the general search engine would return thousands of URLs including animals, product names, nick-names, television programs, movie names and a large amount of other information. The user would have to look through a large number of pages to find information on the golfer Tiger Woods.
A user could qualify a search on a general search engine. For example, a user may enter a search using the keywords xe2x80x9cTiger and Golfxe2x80x9d or xe2x80x9cTiger Woods.xe2x80x9d However, such a search on a general search engine still returns information un-related to the golfer Tiger Woods such as information about animal and forestry. In addition, most general search engines and require a user develop some knowledge and expertise on how general search engines work to create and successfully use a qualified search.
In contrast, on a vortal specifically designed for golf, entering a search using the vague keyword xe2x80x9cTigerxe2x80x9d would only return information about the golfer Tiger Woods. A user would have to sort through, very little if any, information not related to the golfer Tiger Woods. Even very vague search terms on a vortal can be used to return highly relevant search results for a particular vortal.
Vortals are also being used for electronic commerce (xe2x80x9ce-commercexe2x80x9d) including Business-to-Business (xe2x80x9cB2Bxe2x80x9d), Business-to-consumer (xe2x80x9cB2Cxe2x80x9d) and other types of e-commerce transactions. For example, buyers and sellers with different procurement and catalog systems use B2B vortals to inter-operate and cooperate effectively.
However, there are also a number of problems associated with vortals. One problem is that it is difficult to create an appropriate list of keywords to be used for a vortal. Another problem is that it is difficult to create indexes including URLs and electronic content from web pages, and search such indexes. Another problem is that it is difficult to verify whether all indexes including URLs and electronic content for a given vortal are appropriate for a selected topic. These problems and other problems with vortals often lead to user frustration and user satisfaction.
Thus, it is desirable to provide new types vertical search engines for vortals. The vertical search engines should allow vortals to be created that efficiently index and search lists of URLs created from an appropriate list of keywords for a selected topic.
In accordance with preferred embodiments of the present invention, some of the problems associated with vertical search engines are overcome. A method and system for creating a vertical search engine is provided.
One aspect of the invention includes a method for creating a vertical search engine. Another aspect of the invention includes a method for indexing plural domain names associated with a domain name system for a selected set of keywords. Another aspect of the invention includes a method for indexing electronic content from web-sites for plural domain names associated with the domain name system for the selected set of keywords.
The method and system describe herein may help allow vortals to be created that efficiently index and search lists of URLs created from an appropriate list of keywords for a selected topic. Such vortals may provide greater user satisfaction and less user frustration.