Modern search engines commonly provide two categories of search functionalities to users: general Web search and vertical search. In general Web search, the searchable objects are generally identified by URLs (Universal Resource Locators) which are detected by the search engine through hyperlinks. Examples of general Web search include Microsoft's Windows Live™ search and Google™ search. Vertical search generally refers to searching for a class of Web objects or information in a certain domain of object or information (the term “domain” as used herein broadly refers to any field or area of information or knowledge, and is not used in a narrow sense of a network domain). Because the domain of the objects or information may often relate to a certain specialty body of knowledge, vertical search is often referred to as “specialized search”. Examples of vertical search include product search, image search, academic search, article search, book search, people search, and others.
Vertical search has become an important supplement for general Web search. Different from general Web search, vertical search commonly deals with information about certain types of real-world objects instead of general Web pages identified by explicit URLs. In vertical search, an object may appear in any type of a document, such as a simple text document, an office document (e.g., Microsoft Word), a PDF document, an XML file, an email message, an instant message, a digital image file, or a HTML Web page. One document may describe multiple objects, and the same object can appear in multiple documents of different types. Even if the document itself may have an explicit URL with a hyperlink, an object that appears in the document may not be identified by an explicit URL.
For example, a document (e.g., a Web page) may contain information of a list of books, each identified by a list entry containing an image representative of the book, the author, title and publication information of the book, and perhaps also a brief summary or a snapshot of the book. The list entry of each book may or may not contain an active hyperlink that links to a URL.
For another example, information about a product (e.g., Dell™ Latitude C640) may appear in various Web sites that either offer the product for sale or contain various types of descriptive information (such as an introduction or a user review). Again, each piece of information about the product may or may not contain an active hyperlink that links to a URL.
Some current vertical search engines extract object information from the Web and provide indexing and search services for the extracted objects. For example, the structured product data of the Windows Live™ Product Search (products.live.com) and a portion of the data in Froogle™ (froogle.google.com) are extracted from the Web. Likewise, ZoomInfo (www.zoominfo.com) extracts people information from multiple Web pages and integrates the information.
As vertical search typically focuses on a specific domain (or field) or specific type of objects, it enjoys greater odds of providing rich, precise, and structured information to users by utilizing the special knowledge of the domain or field. However, although the performance of vertical search is enhanced due to its specialized nature, the performance of most vertical search domains still has substantial room for improvement. In addition to the difficulty of extracting structured information from the Web, one factor that affects the performance is that some techniques, which are demonstrated to be quite useful and critical in general Web search, have not yet been applied to most vertical object search engines. In particular, current vertical search engines often have difficulties in ranking the degree of relevance of searchable objects according to a certain search query.
One of the significant contributors to the effectiveness of general Web search is its use of URL related anchor text. Anchor text is a clickable text string that is associated with an active hyperlink link into an explicit URL. The URL points to a Web page, which is a search object in general Web search. A vast amount of Web pages are linked to each other in this manner. Modern general Web search engines usually take into account both the number of external Web pages that contained hyperlinks to the object Web page and the anchor texts of the hyperlinks.
It is known that anchor texts of the hyperlinks to a certain object Web page collectively define a valuable description of the object Web page and can be used for ranking the object Web page according to a search query. The descriptive information given by anchor texts tend to be even more valuable than the information contained in the Web page itself. This is because anchor texts are usually found in external Web pages which tend to be independent from the object Web page, and therefore provide a more objective description for the object Web page. Anchor texts effectively aggregates opinions (which can be comprehensive, accurate and objective) of an object Web page by potentially a large amount of other Web pages. The information contained in the anchor text is also less susceptible to spam. Even with link bombing, which aims at page ranking and anchor text, anchor texts are much harder to be affected than page content itself.
Anchor text thus plays an especially important role in improving the performance of general Web search. In fact, most general Web search engines now use anchor text as primary evidence for ranking in order to improve search performance. Some general Web search engines use contextual text in a certain vicinity of the anchor text to automatically compile lists of authoritative Web resources on a range of topics.
Vertical search engines, however, have not been able to take advantage of anchor text to a degree comparable to what general Web search engines have done. This is mainly because vertical search objects generally lack explicit URLs and the corresponding anchor texts that are associated with search objects.