Oftentimes it is desirable to be able to analyze a database to learn statistical information about a population as represented by the database. Typically, a query to such a database is of the form “How many members of a set of entries/rows in the database satisfies a particular property?”, where such property may be expressed as a Boolean formula or as some more complex formula.
For example, it may be desirable with regard to a particular database to statistically determine within the population represented thereby whether a correlation may be found between two factors or sets of factors, such as whether, with regard to a medical database, patients who have heart disease also have a history of smoking tobacco. In particular, a query to a medical database might be fashioned to answer a question such as: “How many individuals as represented within the database are tobacco smokers?”, “How many individuals as represented within the database have heart disease?”, “How many individuals as represented within the database are tobacco smokers that suffer from heart disease?”, and the like.
However, and significantly, it is oftentimes necessary based on a legal or moral standard or otherwise to protect the privacy of individuals as represented within a database under statistical analysis. Thus, a querying entity should not be allowed to directly query for information in the database relating to a particular individual, and also should not be allowed to indirectly query for such information either.
Given a large database, then, perhaps on the order of hundreds of thousands of entries where each entry corresponds to an individual, a need exists for a method to learn statistical information about the population as represented by such a database without compromising the privacy of any particular individual within such population. More particularly, a need exists for such a method by which an interface is constructed between the querying entity and the database, where such interface obscures each answer to a query to a large-enough degree to protect privacy, but not to such a large degree so as to substantively affect statistical analysis of such database.
A recent method uses two algorithms that permit data mining while maintaining privacy. It has been shown in a single attribute database that adding a small amount of noise to a query will preserve privacy, where the total number of queries is sub-linear in the size of the database. A previous approach to calculating the amount of noise to be added to preserve privacy returned only binary values (0,1) as results. It would be desirable if the restraint of 0 or 1 as results was removed.
A more-developed discussion of prior techniques may be found in Privacy-Preserving Datamining on Vertically Partitioned Databases, Dwork and Nissim, CRYPTO 2004—The 24th Annual International Cryptology Conference, Aug. 15-19, 2004, Santa Barbara, Calif., USA, Proceedings, page 528, Springer-Verlag, (“Dwork and Nissim”) hereby incorporated by reference in its entirety, and therefore need not be set forth herein in any further detail.
In view of the foregoing, there is a need for systems and methods that overcome such deficiencies.