1. Field of the Invention
The illustrative embodiments relate generally to an improved data processing system and in particular, to a computer implemented method, data processing system, and computer program product for optimizing the layout of a relational database on a solid state disk.
2. Description of the Related Art
The central concept of a database is that of a collection of records, or pieces of knowledge. Typically, for a given database, there is a structural description of the type of facts held in that database known as a schema. The schema describes the objects that are represented in the database, and the relationships among them. There are a number of different ways of organizing a schema, known as database models or data models. The model in most common use is the relational model, which represents all information in the form of multiple related tables each consisting of rows and columns. This model represents relationships by the use of values common to more than one table. Relational databases are prevalent in applications that require persistent storage of structured data. It is to be appreciated that the term database as used herein is intended to refer to relational database systems such as IBM DB2® which is a product of International Business Machines Corporation, located in Armonk, N.Y., and Oracle 10g® which is a product of Oracle Corporation, located in Redwood, Calif.
An important application of relational databases is business intelligence. A business intelligence workload comprises querying and aggregating large volumes of data records to report and predict business trends. Data records are brought from disks into the main memory of a database server. The speed of the database server, running business intelligence workload, is often determined by the speed of disk input/output (I/O). The speed of disk I/O has been improved with faster interfaces and stripped arrays, however, there is a significant and widening gap between the central processing unit (CPU) speed and disk I/O speed. The performance of a typical business intelligence workload is often limited by the disk I/O speed.
A hard disk is a device that contains a set of one or more magnetic disks on which computer data is stored. The term hard is used to distinguish the disk from a soft, or floppy, disk. The disks or platters for a hard disk are rigid as opposed to the flexible disk in a floppy disk. A single hard disk usually consists of several platters. A hard disk has one or more mechanical disk heads, which must move around on the surface of the disks to access data. Mechanical movement creates significant delay, in particular when data records are not laid out sequentially. Seek time is the time needed to locate data on a disk. The seek time, as measured by the latency of moving disk heads to read non-consecutive data, is orders of magnitude longer for retrieving non-sequential data on a hard drive than the seek time for retrieving sequential data on a hard disk. Therefore, it is typically not desirable to store data records non-sequentially on a hard drive.
The increasingly wider acceptance of solid state disks, also known as flash disks or flash drives, presents new opportunities to narrow the gap between the speed of a CPU and the speed of a disk I/O. Made of flash random access memory (RAM), solid state disks exhibit both faster speed and shorter seek time as compared to a hard disk.
A solid state disk has no mechanical parts. The seek time on a solid state disk is approximately the same as its sequential read/write time. The ratio of seek time versus read/write time on a solid state disk is therefore significantly improved compared to a hard disk.