The present invention relates to database system, and more specifically, to managing uncertainty in database systems.
The level of uncertainty in data warehouses and other information repositories is increasing rapidly due to entity-resolution processes in data integration, automated information extraction from unstructured text, measurement errors in RFID and sensor systems, and anonymization of data for privacy protection. The operation of virtually any modem enterprise requires risk assessment and decision-making in the presence of such uncertain information. Ignoring uncertainty can put an enterprise at risk, for example, by leading to overly optimistic assessments of the value of a company's assets, or by leading to operating policies that result in violations of customer agreements or government regulations.
Consequently, there has been much research on how to represent and manage uncertain data. Much of this effort has focused on the problem of extending relational database systems to handle uncertainty, including work on data-intensive stochastic modeling to capture uncertainty caused by interpolated or predicted data values. In a common paradigm of uncertainty, the answer to a database query is not deterministic, as in classical query processing, but rather there is a probability distribution over possible query answers, and the problem of interest is to compute or estimate important features of this query-result distribution (such as its mean, variance, or quantiles). For example, an extended relational model (ERM) has been developed, in which the classical relational model is augmented with attribute-level or tuple-level probability values, which are loaded in to the database along with the data itself.