A goal of any dataset covering any domain or type of data is to have as complete, certain and accurate a set of data as possible given limitations on the collection and storage of the data. Scientific data, for example, are often incomplete, e.g., only certain portions of the sky have been studied by astronomers, or inaccurate, e.g., instrument readings vary by the sensitivity of each instrument. Similarly, business decisions are made using uncertain business data, e.g., when not all sales data have been reported in advance of a decision being made. Previous attempts at alleviating the problems of completeness and uncertainty have centralized the management of data in data warehouses and used data cleaning to remove inaccuracies. The advent of the Semantic Web has rendered these previous approaches useless. Incomplete and inaccurate data are available everywhere, in massive amounts, and there is no centralized control. There are, however, significant benefits in integrating and cross-linking the data. Incompleteness in the data cannot be completely eliminated. Therefore, methods are needed to handle and to compensate for incompleteness.
In the Semantic Web, the resource description framework (RDF) model is the de-facto data representation standard. In dealing with incomplete and uncertain data, RDF uses “blank nodes” as a rudimentary mechanism to support incomplete information. Due to the lack of clear semantics, this support is inadequate. In addition, existing systems provide implementations of blank node semantics that are far from the spirit of the standard. The only support of incompleteness in RDF comes in the form of probabilistic RDF data. The use of probabilities is impractical as they need to be determined and associated with the data within RDF.