1. Field of the Invention
The present invention relates generally to data processing environments and, more particularly, to system and methodology for securing databases in online, offline, and archive modes.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art. See e.g., Date, C., “An Introduction to Database Systems, Seventh Edition”, Part I (especially Chapters 1-4), Addison Wesley, 2000.
Most of the enterprise implementation of database systems today use volume manager technology. FIG. 1A is a high-level block diagram of a database server system 1 using volume manager technology. As shown, the system 1 includes a server computer 10, running under the control of an operating system, that may host or incorporate one or more volume managers 20 that effectively sit on top of the operating system's file system 30. Server computer 10 communicates with other computers, including ones that are “database clients” (i.e., ones that use the database services provided by the server computer 10). The file system 30 stores and manages the various objects, such as SQL databases 40 (e.g., Sybase ASE database). Volume manager(s) 20, for example available from Veritas (division of Symantec, Inc. of Cupertino, Calif.), provide virtualization to storage devices at the operating system level, thus making it easy to administer storage subsystems. While the volume manager provides storage level virtualization, the storage devices can be used as “raw” or “cooked”. Technology like Veritas Storage Foundation provides additional functionality on top of a virtualized storage subsystem in the form of a special file system that understands database read/write patterns, with accompanying accelerated performance of I/O to the subsystem. On top of the file system or raw device, the SQL database 40 (e.g., database engine of Sybase ASE) writes a file system-like structure to manage the allocation of the storage from the database engine level.
In a typical deployment, a computer system provisions resources or devices for use by a given database, such as hard disk resources. Usually, the task of provisioning resources falls to the System Administrator (SA), who is a user with special privileges (“superuser”) that allow special access to underlying hard disk resources. After the SA has provisioned a computer system's hard disk for use, another user—the Database Administrator (DBA)—provisions the database for use. Provisioning the database includes logically setting up database tables and granting users various rights to use the database. Thus, for system 1 in FIG. 1A, the components (i.e., server/computer, volume managers, file systems or raw devices) are managed by the system administrator (SA). Once the SA allocates the storage devices to the SQL database 40, the database administrator (DBA) decides what table(s) are placed in which devices, and manages the placement. However, for system 1, since the SA has the ownership of the SA domain, he or she may copy the storage device and start a bootleg copy of the SQL database (e.g., Sybase ASE database) on another machine.
For purposes of providing basic checks and balances (i.e., security), typically the SA and DBA are not the same person. The approach is hardly foolproof, however. If knowledgeable enough, the SA (or an intruder posing as the SA) has sufficient control over the computer system's physical devices to hijack the database. After all, the computer's hard disks are merely physical devices, and can be manipulated as such (e.g., for copying files). For example, an unscrupulous SA may copy the database onto another hard disk (e.g., online copy), thereby instantiating a second copy of the database all without the knowledge of the DBA. In particular, in this online scenario there is no mechanism for the DBA to uncover that the database has been compromised, since the database has been copied to a second machine (that the DBA is completely unaware of). Accordingly, today there exists a basic flaw with the approach used to set up database systems.
Besides the foregoing online scenario, there is also a basic flaw with the way databases are archived. To archive data, the DBA will use a utility to copy the database files to some disk. In doing so, the DBA will likely expose the disk to the SA (who has superuser privileges). Thus, the SA may gain access to those archival database files, and in turn may reload those files in a manner to re-create the database (again, an instance that the DBA is completely unaware of). Although one normally assumes that there is some degree of checks and balances between the SA and the DBA, the foregoing illustrates two examples where the database may be compromised.
Although the foregoing has focused on instances where an unscrupulous SA may compromise the database, it should also be understood that the ordinary checks and balances also do not serve to prevent collusion between the DBA and the SA. An auditing subsystem provides the checks and balances. While user security controls who “can” do what, auditing provides a solution to non-repudiation in terms of who “did” what, when. Suppose, for instance, that an important database is subject to auditing by a third party auditor (i.e., independent of the DBA). Normally, the auditor would be able to monitor the DBA's actions (by reviewing audit logs) for detecting unauthorized activities from an unscrupulous DBA. In this normal scenario, the DBA would not be able to tamper with the audit logs since the DBA does not have access to the underlying devices (e.g., hard disk). If the DBA were to conspire with the SA, however, the two may be able to compromise the database in a manner that is undetectable to the auditor. Present-day data centers do not address these issues, and therefore leave their underlying databases exposed. In this age of ever increasing identity theft, leaving database systems exposed poses a substantial security risk.
The current approaches to addressing this problem are cumbersome. Just as a database system may have auditing capabilities, certain high security operating systems can be fitted with auditing facilities (i.e., nonrepudiation log of “who did what when”). Similarly, certain systems may employ a superuser shell with auditing capability, so that actions of the SA may be tracked. However, these are manual solutions that are highly customized for a given customer's deployment. As a result, the solutions do not operate transparently (i.e., in the background, without detection), and since they are very specific to a customer's situation (e.g., requiring manual editing of shell scripts and manual auditing of log files) they do not scale properly for widespread deployment across thousands of systems. Importantly, there is no built-in intelligence in these approaches that would prevent a database system from being compromised. Instead, the approach is at best a “postmortem” or after-the-fact log that an auditor would have to search through manually after the damage has already been done. Accordingly, a better solution is sought.
Another security problem is also present. As more unstructured data (e.g., free-form data such as images, .PDF files, documentation types, blob (binary) data, and other non-SQL data types) are stored in the database and managed, these unstructured data can be created and stored in the database without the benefit of antiviral protection, as present-day antivirus (AV) protection technology detects and clean viruses at the email gateways and as add-on modules (e.g., Symantec's Norton Anti-Virus (NAV) add-on to the Microsoft Windows operating system). Thus with increasing storage of unstructured or binary data in the database coupled with capabilities to embed viruses in that data (e.g., in images), customers are increasingly exposed to viruses present within stored unstructured data objects in databases.
Related to this issue is the security of the data that is stored where database engines have been able to separate responsibilities between data creators (using public key and encrypting) and owners of data (using private keys and decrypting) thus assuring security. The same cannot be said at the operational level between SAs (who are aware of the file system details) and the DBAs (who are aware of the data schemas). This exposes a risk as the SA and the DBA may act in concert as cohorts who can compromise data security. SAs can hijack raw data stores (devices) and work with DBAs to reconstruct the hijacked data stores, thereby compromising the entire system. This can include online disks or database archive dumps.
What is needed is a database system implementing methodologies that address security issues that may arise due to the interaction of two domains, the operating system domain and the database system domain. Such a solution should prevent database compromise that results from the current loophole that allows collusion between the system and database administrators. Specifically, the solution may store data in a database encrypted file system that prevents compromise to online, streaming, and archive data. The present invention fulfills this and other needs.