1. Field of the Invention
This invention relates to a method and apparatus for providing inference control and discovering patterns in tabular data in a database and, more particularly, to a method and apparatus for modifying a hypercube so that it satisfies complex criteria.
2. Description of the Related Art
Producing statistical data that is safe to be released to external researchers is a subject for statistical inference control. Methods exist to modify data in such a way that safe data emerges. The text “Elements of Statistical Disclosure” (Willenborg, L. C. and T. DeWaal, Lecture Notes in Statistics 155, Springer Verlag, September 2000) describes methods that may be used to modify data with near minimum information loss.
Information content in tabular data is measured in various ways. Commonly such measurements are “entropy” based, constructed by taking advantage of the theory developed by Claude E. Shannon in 1948 and originally published in the Bell System Technical Journal, vol. 27 and then in modified form by Weaver and Shannon in “The Mathematical Theory of Communication” from 1949 (see Shannon, C. E., “A mathematical theory of communication,” Bell System Technical Journal, 27(379–423 and 623–656, July and October, 1948 and Weaver, W. and C. E. Shannon, “The Mathematical Theory of Communication,” Urbana, Ill.: University of Illinois Press, 1949, republished 1963). Another useful reference and introduction to information theory is the book “Mathematical Foundations of Information Theory” by A. I. Khinchin (see Khinchin, A. I., “Mathematical Foundations of Information Theory,” Dover Publications, New York, N.Y., 1957).
The disclosure describes methods that may be implemented in an SQL database system. It uses SQL program statements to describe processes in details. The ANSI documents, X3.135-1992, “Database Language SQL” and ANSI/ISO/EIS 9075, defining the SQL standard are available from the American National Standards Institute. Furthermore, the textbook “Database Management Systems” by Ramakrishnan and Gehrke, teaches many useful databases techniques that are applicable to the disclosure (see Ramakrishnan, R. and J. Gehrke, “Database Management Systems,” Second Ed., McGraw Hill, 1999).
The methods disclosed herein are applied to hypercube realizations such as star schemas in data warehouses. Data warehousing techniques and star schemas are discussed and explained in the “Data Warehousing Guide” from Oracle Corporation (see “Data Warehousing Guide” (Part No. A90237-01), June 2001, Oracle Corporation—www.oracle.com.) and in the book “Object-Oriented Data Warehouse Design: Building A Star Schema” by William A. Giovinazzo (“Object-Oriented Data Warehouse Design: Building a Star Scheme,” Prentice Hall, February 2000).
U.S. Pat. No. 6,275,824 discloses methods for enforcing and storing privacy constraints or parameters along with data tables in a database or a data warehouse.