Databases are widely used for data storage and access in computing applications. A goal of database storage is to provide enormous sums of information in an organized manner so that it can be accessed, managed, and updated. In a database, data may be organized into rows, columns, and tables. Different database storage systems may be used for storing different types of content, such as bibliographic, full text, numeric, and/or image content. Further, in computing, different database systems may be classified according to the organization approach of the database. There are many different types of databases, including relational databases, distributed databases, cloud databases, object-oriented and others.
Databases are used by various entities and companies for storing information that may need to be accessed or analyzed. In an example, a retail company may store a listing of all sales transactions in a database. The database may include information about when a transaction occurred, where it occurred, a total cost of the transaction, an identifier and/or description of all items that were purchased in the transaction, and so forth. The same retail company may also store, for example, employee information in that same database that might include employee names, employee contact information, employee work history, employee pay rate, and so forth. Depending on the needs of this retail company, the employee information and the transactional information may be stored in different tables of the same database. The retail company may have a need to “query” its database when it wants to learn information that is stored in the database. This retail company may want to find data about, for example, the names of all employees working at a certain store, all employees working on a certain date, all transactions for a certain product made during a certain time frame, and so forth.
When the retail store wants to query its database to extract certain organized information from the database, a query statement is executed against the database data. The query returns certain data according to one or more query predicates that indicate what information should be returned by the query. The query extracts specific data from the database and formats that data into a readable form. The query may be written in a language that is understood by the database, such as Structured Query Language (“SQL”), so the database systems can determine what data should be located and how it should be returned. The query may request any pertinent information that is stored within the database. If the appropriate data can be found to respond to the query, the database has the potential to reveal complex trends and activities. This power can only be harnessed through the use of a successfully executed query.
In certain implementations of database technology, different organizations or companies may wish to securely link or join their database data. Further to the above example, the retail store may wish to link or share some of its data with outside organizations, such as a product vendor, a healthcare provider for its employees, a shipping company, and so forth. However, the retail store would want to ensure that its data was secure and that the outside organizations could not view all of its data with unrestricted access. The retail store may also wish to enable outside organizations to link, join, and/or analyze its data without permitting the outside organizations to view or export raw data. Depending on the content of the data, it can be imperative to ensure that the data is secure due to privacy concerns, contractual agreements, government agency restrictions, and so forth. For example, personally identifiable information (PH), protected health information (PHI), and other forms of fine-grained data may need to remain secure even when such database data is shared with outside organizations.
In database systems, secure views may be used as a security mechanism to restrict access to specific information stored in the database. A secure view may be specifically designated for data privacy to limit access to sensitive data, such as PII or PHI, that should not be exposed to outside organizations and/or all users of the database. The implementation of views, and how the implementation of views is handled, can potentially lead to information leakage. For example, during query optimization, certain filters may be pushed across the view definition closer to input tables and information may be leaked to a user if a user-specified filter is evaluated before secure predicates are evaluated. Secure views can ensure that the security of a regular view cannot be circumvented by clever querying of data that is stored in the regular view.
In some instances, two or more organizations may wish to join data to make certain determinations about data that is common between the two or more organizations, or for one organization to enrich the data of the other. For example, two companies may wish to determine how many customers the two companies have in common. This may be a common inquiry between, for example, buyers and sellers of advertising, between healthcare payers and providers, and so forth. This can be a challenging question to answer without one party exposing its entire customer list to the other party. The customer list may include sensitive information that should not be shared with the other party and/or it may include information that the sharing party does not wish to expose for contractual or business reasons. The two parties may wish to securely join data so the parties may determine common data between the two parties or make other beneficial determinations, or to enrich each other's data, without exposing all underlying data. Disclosed herein are methods, systems, and devices for securely joining database data.