Now that many companies have instituted a wide variety of the computer-based information management systems, there is great tendency to input all types of data to these systems. This has proliferated with the use of wide-area networks (WANs), municipal-area networks (MANs), local-area networks (LANs), and intranets by companies, groups or divisions within a company that are geographically separated, or groups of users without any corporate affiliation. In many cases, information is input from various locations that is then stored in a number of ways with the most popular being in a number of databases. These databases may be structured to receive information of a particular type or varied. This information is usually loaded into a database is such a way that, hopefully, it can be readily retrieved.
Databases that receive the information from the various users are usually overlaid with a database management system (DBMS) that will include database applications. Typically, DBMSs are configured using four general types of data-structure classes. These are hierarchic, network, relational, and semantic data-structure classes. In particular, the relational data-structures class, with its association with DBMS, has evolved into what is now referred to as Relational Databases.
Relational databases are now viewed, in many cases, as the standard for the design of both large and small databases. The simplest model of a database would consist of a single table, which includes a number of fields for each record that is desired to be stored. In such a system, it will be necessary to repeat a great deal of information within the database. This makes it very difficult to maintain the data, and there may be resulting data errors based on inconsistent records. However, if a relational database is used, activities in the database will be related through, e.g., a customer record. The database application software will manage groups of records that are related.
As an example, if there is a catalog-based sales system, information may be stored in three tables. The tables could be titled Name, Order, Items Ordered. Each of these tables, in turn, would have subsections of information. The subsections for the Name table could include the spelling of the name; the street address; city, state, and zip-code; and phone number. The Order table could have the subsections invoice number, date, and catalog number. Finally, the Items Ordered table could have the subsections catalog item number ordered, quantity, and price. If this information is handled by a relational database system, each of the Order table entries will also include a name identification (NID) field and the Items Ordered table will also include an order number (ONB) field. These two new fields are foreign keys of the relational database and provide a serial link for the records of the database.
The creation of the database in the method just described alleviated the need to repeat the Name table data for each Order table record because you can rely on the serial linking, through the NID. Accordingly, if there is, for example, a change in the address associated with a particular name in the Name record, this change will be made for every order associated with that Name. This same process is applied to the second link, ONB.
In a majority of the cases, the queries that are developed in a relational database are based on Structured Query Language (“SQL”). Using SQL to develop a query, the query, for example, could request to display all of the orders placed by someone with a particular name. The Name and Orders tables are linked by the NID. Once linking of the two tables has taken place, the number of orders may be determined for a particular Name.
A query in SQL, with regard to obtaining information from the three tables defined previously, would consist of a SELECT command followed by a FROM command. The SELECT command or keyword defines the desire to select records from the tables. This could be SELECT name; street address; city, state, and zip-code; order invoice number; and order date. The command or keyword that follows is FROM. The FROM command or keyword indicates from where the data comes. Accordingly, the FROM command or keyword in the previous example would indicate the data was coming from the Name table and the Order table. The FROM command will be associated with the appropriate JOIN command which will provide the proper connection of the data from the two tables.
The decision must be made as to how to maintain the integrity of the database files when there are deletions. If cascading is the desired method, if there is a change in the NID in the Name table, then there would be a change in the related records in the Order table. The second choice is that deletions will not be permitted if there are related records. In this case, if there is a deletion of a NID and there are related records in the Order table, the deletion of the NID will not be permitted until their related records are deleted or are no longer related to the NID being deleted.
The basic structure of a relational database that uses SQL to effect queries has been described. Now, a bit more in depth view will be undertaken to set forth SQL in the context of the present invention.
As discussed above, tables are to represent things in the real world. In a particular database, the row of a table is to be unique. Each of the columns represents a specific field for each table row. This is carried out by each table having a primary key that is usually a column of the table having unique values for the table. Each table will have only one primary key. Together all of the columns of the table with unique values are candidate keys from which the primary key is selected. The other columns are alternate keys. A simple key is made up of one column where a composite key is made up of two or more columns. The easiest way to choose the primary key is to select one column if you can.
The main use of primary keys is for creating relationships between multiple tables in a database. This concept involves the use of foreign keys. A foreign key in a table references a primary key in another table. This joining relationship is the basis of relational database theory. For example, the primary key in a first table would be a foreign key in a second table. If desired, these table-to-table relationships may continue infinitely. They may be a one-to-one relationship, a one-to-many relationship, or a many-to-many relationship, which in reality are multiple one-to-many relationships.
The efficiency of the database tables is accomplished by normalization to remove redundancies from the tables. The three general normalization forms are the First, Second, and Third Norm Forms. The First Norm Form is that for every row-by-column position in a particular table, is the only one value. The Second Norm Form is that every non-primary key column is dependent on the primary key. And, the Third Norm Form is that all of the non-primary key columns are independent of one another. There are higher Norm Forms that solve certain specific inadequacies in the first three, but principally, normalization is satisfied by satisfying the First, Second, and Third Norm Forms.
The integrity rules are of two types: General and Database-Specific. The General Integrity rules may be either Entity Integrity or Referential Integrity. The Entity Integrity rule is that the primary keys cannot have missing data. The Referential Integrity rule is that a database must not contain any unmatched foreign key values. In following these rules, if a referenced primary key changes or a row is deleted, the system may take one of three options. First and second, it may cascade or prevent the change, as described before. Third, for deletions, the system may set all of the foreign keys values associated with the deletion to zero or null.
Database-Specific Integrity rules are ones that are specific to a particular database. These are ones that the database creator develops for enhancing his/her database and especially database management.
SQL was discussed briefly above as a query language for relational databases. is particularly useful to create and manipulate relational databases. The SQL commands are divided into two groups. These are Data Definition language (“DDL”) and Data Manipulation Language (“DML”). DDL includes the commands that are used to create and delete databases and databases objects. On the other hand, DML is used to insert, retrieve, and modify data once a database is defined with DDL.
The four basic commands of DDL are CREATE, USE, ALTER, and DROP. CREATE is a command used to establish a database in a system. The CREATE command may also be used to create tables in a database. USE is a command that is used to specify the database that is desired to work within the system. ALTER is a command that is used once a table is created to modify the definitions on it. That is, changes to the structure of the table may be made without deleting and recreating it. DROP is a command that is used to remove entire database structures from a database.
DML, as stated, is used to retrieve, insert, and modify database information. The DML commands INSERT, SELECT, UPDATE and DELETE are used during routine operations of the database. INSERT is a command that is used to add records to an existing table. As an example, INSERT could be used to add new rows to the existing table. SELECT is a command that is used to retrieve specific information from an operational database. It may be tailored narrowly or broadly. UPDATE is a command that is used to modify information contained in an existing table. This may be done for an individual cell or for more than one cell of values. DELETE is a command that is used to remove a record from an existing table.
The forgoing SQL description provides the basic concepts that are used with relational databases. Although it exists, extensible mark-up language (“XML”), which is the successor to HTML, has not been used to any great extent with relational databases that use SQL as the querying language.
There are four main types of applications that would particularly benefit from the use of XML. The first type is applications that require the user to mediate between two or more heterogeneous databases. The second type is applications that attempt to distribute a significant proportion of the processing load from a server to the user. The third type is applications that require the user to present different views of the same data to different users. The fourth type is applications in which intelligent users attempt to tailor information discovery to the needs of individual users. Accordingly, it would be useful to apply XML to relational databases to effect better system operation.
There is the need for improved methods for relational database management that makes use of XML for ease in the use of database systems, and provides increased speed and efficiency in the manipulation and retrieval of information.