In a converged network, user data flow occurs continuously with minimal impedance. Enhanced services will demand exchange of user data among different users and service providers. User data is exchanged in a transaction via a query response paradigm. Due to present database design and query language constructs, in most of the cases, databases disclose more information than required from the context of the query.
When a query for user data is received, a database is queried and, depending on the keyword, retrieves a record and discloses information to the requester. The queried database can be a local database query, a query received over the Internet, a query issued by a web crawler or an automated agent querying on behalf of an end user, and queries issued by human or automated agents querying on behalf of network operators, application service providers, or other providers of services. By disclosing the complete record based on a keyword, the privacy of the end user is compromised. For example, a record of user data indexed by name contains street address, city and state. Suppose that a particular application needs to find out in which state the user lives. A query on this user record would typically return the entire address. This may happen because the developer of the database designed schema in such a way that any query on an address returns the whole address schema. Disclosing the street address and city information may be a breach of privacy from the user's perspective.
It is common for organizations to release personal data with explicit identifiers, (such as name, address and telephone number), removed on the assumption that anonymity is maintained because the data looks anonymous. FIG. 1a shows a data set for names A, B, C. FIG. 1b shows a suppression method to exclude sensitive information in the data set. However, in most of these cases, data can be used to re-identify individuals by linking or matching the data to other data, or by looking at unique characteristics found in the released data as shown in FIG. 2. To overcome the problem of linking publicly available data to re-identify the data, generalization techniques are used. Generalization makes released data ambiguous so that it cannot be linked to other data available in the public domain.
A hippocratic database tags data with the purpose for which it is collected. The “purpose” is part of the data schema. A query on such a database needs to specify the purpose. If there is a match between the purpose of the query and purpose of the data record then only the data is returned. Thus, defining purpose and matching with query can restrict disclosure of information. The user can consent about the purpose of data collection or restrict it. However, in a hippocratic database, collecting and properly tagging the data record with purpose is not always accurate. It may not scale properly in case of user data which is stored in different nodes and in different formats.
Binning methods of data partitioning may be used to classify data in common buckets. The binning is done on entire database records, mostly classifying data based on functional roles. As a result, the binning method of data partitioning loses the usefulness of data. It treats the whole database and partitions in common bins or buckets as it does not consider user data. User data may exist in different format and structure, and for every format and structure data needs to be binned again. The binning method is not interoperable with different systems and database management systems.
The method of field level privacy in databases lets a user see certain rows or columns, (e.g., 5 rows from a table of 10 rows×10 columns). This is known as a method of “suppression” or “restriction”. Users have a restricted access to certain portions of the table. This restriction is implemented by providing users certain “privilege labels” and marking the rows or the table with the same “privilege label”. the most simple scenario, when a SQL query finds a match between a user label and the label by which the row was marked, the data is retrieved. Such a method resembles the existing role based access control (RBAC) model. The method of field level privacy, although used extensively, has several disadvantages. By pre-assigning user privileges and marking rows and columns with labels, a static relationship is created. Only the user's predefined role is considered for access. Other contextual information, such as the kind of application making the request and what is generally done with the data, is not considered. Besides suppression, generalization also helps in improving privacy. The field level technique does not provide generalization. By only using the suppression method, there is a chance of privacy breach by such methods as correlation, extrapolation, or the like. By generalizing, the chances of correlations are reduced.
In another prior art model known as translucent databases, hashing is extended in new and important ways. Translucent databases can be used for mapping certain information, for example crime trends, where the first column is the hash of the person's name, and the second column is a hash of their full address, and the third column is a hash of their block and street. Certain types of information, such as specific incidents, can be grouped together by grouping entries with identical block hashes. It can be determined if the incidents refer to the same person by checking to see if those hashes are different.
FIG. 3a shows building of the hierarchy of information elements. Using ontology (data description), the data classifier builds a hierarchy of information elements from the user data record. Taxonomy can be defined such that it partitions user data to a proper place in a hierarchical manner. For example, an address of a user can be arranged as a hierarchical tree by putting the country information at the highest level, the state information at the next level, then city and at the leaf node street address. Each node in the hierarchy can have attributes or properties. These properties indicate other identifying information that an entity, such as a person, has based on the location of the entity. By classifying data, information is abstracted at each node level.
Ontology and knowledge based systems (KBS) are useful where a semantic query attempts to help a user obtain or manipulate data in a database without knowing its detailed syntactic structure. Unlike syntactic queries, (e.g., SQL, XPath and XQuery queries), which only support retrieval of explicit data based on syntactic information (i.e., elements/documents structure), semantic queries enable retrieval of both explicitly and implicitly derived information based on syntactic and semantic information contained in the database. A user describes what information is needed without having detailed knowledge of how the information is actually represented. For example, in a semantic query, the query “list the name of all employees in the database” is equivalent to the syntactic query “list the name of all faculty, trainees and employees in the database”, provided the database semantics specify that all faculty members and trainees are employees.
FIG. 3b shows examples of an ontology based classification. Ontologies with their queries can be considered a form of a database system. An example is the IBM ontology management system (also known as semantic network ontology base (SNOBASE)). Ontologies are often hierarchic with nodes having a parent/child relationship. An ontology can be effectively exploited to rewrite a user query into another query such that the new query provides additional meaningful results that satisfy the intention of the user. It does not consider privacy issues while disclosing data.
A second issue regarding the use of data in a converged network how to perform functions such as extraction, organization, processing, and transfer of data such that it can be obtained from, and transmitted to, different databases or applications securely and efficiently. A data base management system (DBMS) is a technology that is well-suited to some of these problems.
An example of the use of a database management system (DBMS) on a system that runs multiple applications and interacts with multiple data sources occurs in digital rights management (DRM). Such a DBMS can reside on a DRM server on the service provider's side, or on a user's device, such as a mobile device. In a typical scenario of DRM application on a user's device, the user's device may receive DRM protected data from a service provider. This data may have been generated for different DRM user agents, and may consist of varying items, and have differing permissions assigned. A user may want to access and use this disparate data in an organized manner. A DBMS is generally a technology that can store data in an organized way that allows for efficient access to data.
In the prior art, DRM systems and database management systems were integrated only to a limited degree, whereby DBMSs held information that was packaged into DRM rights objects (RO) and content objects (CO) prior to their distribution to a DRM user agent program at a user's device.
Prior systems, such as iTunes, organizes DRM content at a DRM user agent using a datastore that may have aspects of a database. Some DRM technology can generate DRM protected content that can be processed by different DRM rendering systems. For example, RealNetworks harmony technology allows a consumer to purchase a song that can be played on any device that uses Apple FairPlay DRM, Microsoft Windows Media Audio DRM, or RealNetworks Helix DRM.
However, DBMSs are not integrated to the degree where they are used to organize the foreseeable multitude of disparate DRM protected data on users' devices, nor to manipulate the data from such multitudes of disparate DRM systems in a unified manner. For example, data from multitudes of DRM systems is not organized in a way such that its use enables access to DRM protected content that can then be processed by the corresponding DRM user agent. The DRM user agents can currently organize content to a degree, but this is limited to content that is either wrapped in its own DRM protection technology, or content that is not DRM protected. DRM systems currently store data for the basic retrieval of the data based on context restrictions, e.g., user ID, time, location, but do not provide for the coordinated retrieval of data contained in different content objects.
With the convergence of wireless devices that can obtain different types of services from a variety of wireless and wire-line access technologies, it is becoming very important that network operators have database systems that can organize, manipulate and transfer various data related to the identification (ID) of a user. This data can be related to the type of devices that are used to obtain services, and the types and characteristics of services, tariffs, billing information, service providers, as well as information about security of the devices and services.
Although prior art techniques for utilizing DBMS exist, there is a need for DBMSs to organize databases in such a way as to handle new data from unknown sources across multiple access technologies and to transfer data to other DBMSs while assuring transportability and protection of privacy or other proprietary information.