There are various categories of search engine software, such as web search or full-text search (e.g., Lucene), database or structured data search (e.g., Dieselpoint), and mixed or enterprise search (e.g., Google Search Appliance). Web search engines can use hundreds of thousands of computers to process billions of web pages and return results for thousands of searches per second. High volume of queries and text processing may require the software to run in highly distributed environment with high degree of redundancy.
In modern search engines, searching for text-based content in databases or other structured data formats (e.g., XML, CSV, etc.) may present some special challenges. Databases may be slow when solving complex queries (e.g., with multiple logical or string matching arguments). Databases allow logical queries which full-text search may not (e.g., use of multi-field boolean logic). While there is no crawling necessary for a database since the data is already structured, it is often necessary to index the data in a more compact form designed to allow for faster search.
Additionally, keyword searches of the Internet typically involve web crawlers identifying words on webpages and indexing the webpages. Ranking algorithms may rank webpages based on commonality with one or more keywords. Additionally, ranking of webpages may be adjusted based on the number of connections to and from a webpage. Email data and file data may also be searched using keywords, and searched of document sets based on keywords is possible.
However, document set and email searches are based on static documents, and do not account for the ways in which documents are sent, edited, attached, emailed, or modified. Information relating to the use of a document is relevant to the importance of a document, but is not utilized in conventional keyword searches of enterprise systems.