In the most general sense, a database is a collection of data. Various architectures have been devised to organize data in a computerized database. Typically, a computerized database includes data stored in mass storage devices, such as tape drives, magnetic hard disk drives and optical drives. Three main database architectures are termed hierarchical, network and relational. A hierarchical database assigns different data types to different levels of the hierarchy. Links between data items on one level and data items on a different level are simple and direct. However, a single data item can appear multiple times in a hierarchical database and this creates data redundancy. To eliminate data redundancy, a network database stores data in nodes having direct access to any other node in the database. There is no need to duplicate data since all nodes are universally accessible. In a relational database, the basic unit of data is a relation. A relation corresponds to a table having rows, with each row called a tuple, and columns, with each column called an attribute. From a practical standpoint, rows represent records of related data and columns identify individual data elements. The order in which the rows and columns appear in a table has no significance. In a relational database, one can add a new column to a table without having to modify older applications that access other columns in the table. Relational databases thus provide flexibility to accommodate changing needs.
All databases require a consistent structure, termed a schema, to organize and manage the information. In a relational database, the schema is a collection of tables. Similarly, for each table, there is generally one schema to which it belongs. Once the schema is designed, a tool, known as a database management system (DBMS), is used to build the database and to operate on data within the database. The DBMS stores, retrieves and modifies data associated with the database. Lastly, to the extent possible, the DBMS protects data from corruption and unauthorized access.
A human user controls the DBMS by providing a sequence of commands selected from a data sublanguage. The syntax of data sublanguages varies widely. The American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) have adopted Structured English Query Language (SQL) as a standard data sublanguage for relational databases. SQL comprises a data definition language (DDL), a data manipulation language (DML), and a data control language (DCL). The DDL allows users to define a database, to modify its structure and to destroy it. The DML provides the tools to enter, modify and extract data from the database. The DCL provides tools to protect data from corruption and unauthorized access. Although SQL is standardized, most implementations of the ANSI standard have subtle differences. Nonetheless, the standardization of SQL has greatly increased the utility of relational databases for many applications.
Although access to relational databases is facilitated by standard data sublanguages, users still must have detailed knowledge of the schema to obtain needed information from a database since one can design many different schemas to represent the storage of a given collection of information. For example, in an electronic commerce system, product information, such as product SKU, product name, product description, price, and tax code, may be stored in a single table within a relational database. In another electronic commerce system, product SKU, product name, description, and tax code may be stored in one table while product SKU and product price are stored in a separate table. In this situation, a SQL query designed to retrieve a product price from a database of the first electronic commerce system is not useful for retrieving the price for the same product in the other electronic system's database because the differences in schemas require the use of different SQL queries to retrieve product price. As a consequence, developers of retail applications accessing product information from relational databases may have to adapt their SQL queries to each individual schema. This, in turn, prevents their applications from being used in environments where there are a wide variety of databases having different schemas, such as the World Wide Web.
A further problem with conventional search engines is a tendency to return very large amounts of data, or to require the search parameters to be narrowed. When large amounts of data are presented, the display may take many “pages” before all data is seen by the user. The time and expense involved in such a data review may be significant.