As more and more information continues to be stored and transferred digitally, electronic search engines used to locate desired data are becoming essential tools in computer network environments. There are, generally, two types of modern search engines technologies currently in use: keyword search engines and parametric search engines.
Keyword search engines are typically used to search for data in unstructured documents. Unstructured documents often contain information that is not formatted into any predefined manner. Such documents may include disparate information loosely arranged into paragraphs, lists, tables and other layouts. Unstructured documents may include Hypertext Markup Language (HTML) pages, Portable Document Format (PDF) documents, and word processor documents.
In general, keyword search engines comb through unstructured documents and store keywords in a text index. The index record is associated with a network location and, often times, additional metadata about the document. When a user submits a keyword search, the search engine examines its records and returns the network locations of documents matching the keyword search. Some popular keyword search engines include Google (R), Inktomi (R), and AltaVista (R). Google is a registered trademark of Google Incorporated, Inktomi is a registered trademark of Inktomi Corporation, and AltaVista is a registered trademark of the AltaVista Company.
Parametric search engines, on the other hand, are typically used to search through structured documents. Structured documents often contain information formatted into predefined categories or fields. Structured documents are analogous to a telephone book containing listings in a specific arrangement. One example of a structured document is a database document. Databases typically format information into tables containing related parameters grouped together to form a database entry. Because of the organized nature of structured documents, they are particularly conducive to parametric searches. Parametric searches often involve arithmetic operators, such as less than (<), greater than (>), and equal to (=) operators.
The main feature difference between the two search solutions is that text search solutions cannot generally perform parametric searches like relational database searches. A text search solution is best at matching keywords in indexed documents rather than performing range queries. It is thus typically impossible to ask text-based search engine to find cashmere sweaters that are sold for less than 100 dollars.
Relational databases, on the other hand, typically lack support to perform fast, fuzzy keyword matching. Unlike most text search solutions which are read-only, relational databases are generally transaction-based, and thus have to slow down to lock and unlock data even for read-only operations. This design puts relational databases at a disadvantage in terms of speed and scale.
Despite the aforementioned differences, it is favorable for an information source, such as an e-commerce web site, to provide both text and parametric search features. However, this feature requirement implies the information source will need to host both types of search solutions: text searches and parametric searches. From the viewpoint of reducing infrastructure management and financial cost, it is advantageous to apply a single search solution to satisfy the requirements of both features.