Today, Personally Identifiable Information (PII) is stored in many commercial software systems, such as hierarchical, columnar, and relational database systems, as well as in Hadoop/Big Data processing infrastructures. Conventional systems may protect PII with, for example access restrictions using authentication and authorizations.
In addition, conventional systems may protect PII with, for example, encryption. Such encryption includes encryption of data at rest (backups, etc.) and encryption on data in transmission (e.g., by encrypting the communication channel (e.g., Secure Socket Layer (SSL)), by encrypting the data in transfer (message encryption), by a combination of encrypting the communication channel and the data in transfer, and by audit trails).
Some conventional systems detect PII in a database. However, PII is just one example of sensitive information, and there are other types of sensitive information (such as salary information, performance reviews, confidential product plans, etc.).
Within conventional database systems, sensitive information (e.g., PII) once it is no longer needed should be destroyed. However, some conventional systems do not properly destroy such sensitive information. For example, if a table containing sensitive information is dropped in a database, but no measure is taken to overwrite the appropriate areas on the hard disk where the table was stored, this leaves the sensitive information on the hard disk vulnerable to hard disk discovery tools.
A single node database system may be described as a database installed on a server that has 4 tablespaces, where: tablespace TS1 contains the tables T1, T2 and T3, where tablespace TS2 contains the tables T4, T5, and T6, where tablespace TS3 contains the tables T7, T8 and T9, and where tablespace TS4 contains the tables T10 and T11. As storage for the database, there are two storage systems, each containing six hard disks where, in the two storage systems: eight of the hard disks have file systems managed by an operating system and four of the hard disks are used as raw devices, which means there are no file systems managed by the operating system on these four hard disks.
With reference to the single node database system, assume that the tablespace TS4 has been created with the following Statement 1 using the four raw devices:
Statement 1CREATE TABLESPACE T4 MANAGED BY DATABASE USING (DEVICE ‘/dev/rhdisk0’ 10000,     DEVICE ‘/dev/rhdisk1’ 10000,     DEVICE ‘/dev/rhdisk2’ 10000,     DEVICE ‘/dev/rhdisk3’ 10000 )
With Statement 1, table T10 and table T11 are created in tablespace TS4. Assume table T10 contains sensitive information (e.g., PII) and is dropped. In this scenario, conventional techniques used with file systems for deleting PII cannot be used to ensure that the portions of the four raw devices that contained the sensitive information are securely deleted (e.g., by overwriting the sensitive information several times with zeros, etc.) so that the information is not recoverable with disk recovery tools. This is because the operating system and file system can not affect the hard disks used as raw devices that are managed by the database.
Continuing with the example of the single node database system, assume that the tablespace TS1 has been created with the following statement 2:
Statement 2CREATE TABLESPACE T1MANAGED BY DATABASE USING (FILE ‘C.\db2\file1’ 1 M,FILE ‘D:\db2\file2’ 1 M) AUTORESIZE YES INCREASESIZE 2 M MAXSIZE 100 M
In this case, tablespace TS1 uses two file containers with an initial size of 1 megabyte (MB), a growth rate of 2 megabytes and a maximal size of 100 megabytes. A file container may be described as a file on a file system. This means, tables T1, T2 and T3 in tablespace TS1 can allocate jointly a maximum size of 200 MB (2 file containers with a maximum size of 100 megabytes each). Now, assume T2 is a table containing sensitive information and T2 is dropped. Unlike the scenario using Statement 1, there is now a file system layer between the operating system and the database. However, only the database knows which portions of the two file containers file1 and file2 were used by the database, and, thus, need to be cleaned (e.g., by overwriting the appropriate portions with zeros, etc.) to ensure the sensitive information cannot be recovered. Therefore, the well understood techniques used with file systems for deleting PII can not be used to ensure that the portions that contained the sensitive information are securely deleted.
A multi-node database system includes multiple nodes. A node may be described as a separate computing device, such as a server system. For this example, a database is partitioned across multiple nodes. As storage for the database, there are three storage systems accessed by Network Attached Storage (NAS), with each of the storage systems containing six hard disks where, in two storage systems: six of the hard disks have file systems managed by an operating system, six of the hard disks are used as raw devices and do not have file systems managed by the operating system, and four of the hard disks have Encrypting File Systems (EFS).
A tablespace may span across one or more nodes. A partition group clause may be used to adjust the number of nodes that the tablespace may span in a fine granular manner. A table may be split across several partitions.
If combined with file system techniques, such as, Global Parallel File System (GPFS), there is an additional abstraction layer between the storage devices and the file systems involved, hiding from the file space consumers such as the database the details of the underlying physical storage hardware to improve business resiliency).
Similarly to the single node database system, with a multi-node database system, it is difficult to destroy sensitive information when a tablespace is dropped.
In conventional systems, only the database knows which portions of a hard disk/file system belongs to a table and should be cleaned for destruction of sensitive information.
Some conventional systems use file system encryption techniques. Such file system encryption techniques may not be used in case of high-end performance requirements on the database because of their performance impact and Input/Output (I/O) operations are a performance constraint for any database operations. Moreover, in case of raw devices, there is no file system involved, and so these file system encryption techniques are unavailable.