The present invention relates to indexing structured documents.
Servers located around the Internet and contained in Intranets serve up content (e.g., pages, documents) to users on demand. A user, interacting through a search engine, enters a text query for information and the search results are displayed to the user as text, graphics, audio and/or video through a graphical user interface most often referred to as browser software. There are several functions that are part of a search engine, such as information gathering, indexing, categorization, and searching. Information gathering usually uses Web crawlers to send visited pages to an index engine. The index engine uses some form of inverted files and, given a word, returns a list of references that contain the word. Categorization, or clustering, attempts to categorize the pages according to attributes, such as topics. The searching allows the user to ask content-based queries and get ranked result sets.