1. Field of the Invention
The present invention relates to a system and a method for restructuring a relational database.
2. Description of the Related Art
As a fundamental theory related to the structure of a relational database, a normalization theory is well known. This normalization theory is described in, e.g., Introduction to Relational Database, Data Model.SQL.Management System, Yoshifumi Masunaga, Science Sha.
As described in this book, when a relational database is to be designed, a third normal form relation must be defined. If a relation having no third normal form is defined, and this relation is used, a memory area required for storing the relation or the number of tuples in a table required to be updated is disadvantageously increased.
In order to explain the reason for posing the above problems, the normalization theory will be briefly described below first.
In a relational database, normal forms of relations defined with reference to a functional dependency include a first normal form, a second normal form, and a third normal form. A relation having the first normal form means that a value of any attribute of the relation cannot be divided any more. Therefore, a relation in which a set appears in the values of attributes or data having a structure appears has no first normal form. This means, in the above book, that a relation having the first normal form in a relational database defines that the "direct product, power set, and power set of direct product in the range of attributes are inhibited from being used as attribute values".
A relation having a second normal form means that all nonprime attributes of a relation having a first normal form are completely functionally dependent on any candidate key of the relation having the first normal form. In addition, a relation having a third normal form means that any attribute in the relation having the first normal form is not narrowly transitively dependent on any candidate key of the relation having the first normal form.
For example, since the value of an attribute &lt;name&gt; is selected from direct products of a range called a family name and a range called a first name (for descriptive convenience, a middle name is omitted) in a relation: 1 &lt;personnel and payroll&gt; shown in FIG. 1, the relation: 1 &lt;personnel and payroll&gt; has no first normal form. When this relation is normalized to have the first normal form, a relation: 2 &lt;personnel and allowance&gt; shown in FIG. 2 is obtained.
In the relation &lt;personnel and allowance&gt;, each attribute &lt;employee number, month&gt; is a main key. In this case, one main key is selected from at least one candidate key which is present in the relation. The candidate key is a minimum attribute set capable of uniquely identifying a tuple. The value of the main key is restricted not to be null.
In the first normal form relation &lt;personnel and allowance&gt; shown in FIG. 2, the following problems are posed. Eddie Adams is employed as a new employee, his employee number is set to be 1006, and his assigned section is set to be a second sales section. When this information is stored in the relation as a tuple, an attribute &lt;month&gt; is set to be null because his salary has not yet be paid. This is inconsistent with the above restriction, i.e., the value of the main key is not set to be null, and the tuple cannot be inserted. In addition, assuming that David Douglas is transferred to a second sales section, three tuples related to David Douglas must be simultaneously changed. This means a plurality of tuples must be changed in spite of a change in one data. Assuming that Mary Smith received no salary in March because she was employed in April, her tuple must be deleted. However, when the tuple is deleted, information in which Mary Smith belongs to the second design section and works in New York is deleted. The above problem is called relation update anomaly.
Since the updating abnormality occurs in the first normal form relation: 2 &lt;personnel and allowance&gt;, the first normal form relation is divided into two second normal form relations &lt;personnel&gt; and &lt;allowance&gt; shown in FIGS. 3A and 3B. In this case, the second normal form relation is a first normal relation as described above, and is a relation having non-key attributes which are completely dependent on an arbitrary candidate key. Attributes A which are completely dependent on attributes B mean that the values of the attributes A functionally determine the values of the attributes B (the values of the attributes B are functionally dependent on the values of the attributes A), and the value of an attribute A' which is an arbitrary proper subset of the attributes A does not functionally determine the values of the attributes B (the values of the attributes B are not functionally dependent on the value of the attribute A'). Note that the above dependency between attributes is called a functional dependency.
For example, in the relation: 2 &lt;personnel and allowance&gt;, since non-key attributes &lt;name, section, location&gt; are functionally dependent on not only candidate keys (main keys) &lt;employee number, month&gt; but also the attribute &lt;employee number&gt; which is a proper subset of the candidate keys, the non-key attributes are not completely dependent on the candidate keys. Therefore, the relation: 2 &lt;personnel and allowance&gt; does not have a second normal form.
On the other hand, the non-key attributes &lt;name, section, location&gt; are completely dependent on the candidate key &lt;employee number&gt; in the relation &lt;personnel&gt;, and the non-key attribute &lt;salary&gt; is completely dependent on the candidate keys &lt;employee number, month&gt; in the relation &lt;allowance&gt;. For this reason, both the relations &lt;personnel&gt; and &lt;allowance&gt; have second normal forms, respectively.
However, even in the second normal form relation &lt;personnel&gt;, the following update anomaly (updating abnormality) occurs. For example, assume that a third sales section is established in L.A. When this information is to be inserted in the relation as a tuple, the value of the &lt;employee number&gt; serving as a main key becomes null. This is inconsistent with the restriction in which the main key is not null. In addition, if Nancy white belonging to the accounting section is retired in the relation &lt;personnel&gt;, when his tuple is deleted, information in which the accounting section is located in Washington, D.C. is lost. In addition, if the location of the first sales section is changed to San Francisco, all tuples corresponding to persons belonging to the first sales section must be updated.
Since the above updating abnormality occurs in the second normal form relation &lt;personnel&gt;, the second normal relation is divided into two third normal form relations &lt;employee&gt; and &lt;organization&gt; shown in FIGS. 4A and 4B. The third normal form relation, as described above, is the second normal form relation, and is a relation in which an arbitrary non-key attribute is not functionally dependent on non-key attributes except for the arbitrary non-key attribute.
For example, since a non-key attribute &lt;location&gt; is functionally dependent on another non-key attribute &lt;section&gt; in the relation &lt;personnel&gt;, the relation &lt;personnel&gt; does not have a third normal form. On the other hand, since non-key attributes are not functionally dependent on other non-key attributes in both the relations &lt;employee&gt; and &lt;organization&gt; shown in FIGS. 4A and 4B, both the relations have three normal forms, respectively. In addition, the relation &lt;allowance&gt; shown in FIG. 3B has a third normal form. Therefore, a database constituted by the relations &lt;employee&gt;, &lt;organization&gt;, and &lt;allowance&gt;, any updating abnormality of the relations does not occur.
The normalization theory in a relational database has been described above. When there is a relation having no third normal form, the above updating abnormality occurs. Therefore, in designing a conventional relational database, dependencies are desirably defined such that all relations have third normal forms.
However, in fact, when the relations are to be defined, an operation of correctly detecting the functional dependencies and complete dependencies between attributes is very difficult. When a large number of relations have a large number of attributes, this operation becomes more difficult. Therefore, while the relational database is designed, i.e., while the attributes of relations are defined, the defined relations include a relation having no third normal form at a high probability. Even when several relations each having no third normal form are defined, and the database is operated, a database manager may notice that the relation having no third normal form is present. This case will be described below using an example.
Assume that a database manager defines the relation: 2 &lt;personnel and allowance&gt; shown in FIG. 2, and an application programmer prepares the following programs to the relation: 2 &lt;personnel and allowance&gt;.