The present invention relates generally to the field of database systems, and more particularly to data partitioning for high-performance data retrieval.
Generally speaking, a database is an organized collection of entities and their relationships. An entity is a thing which can be uniquely identified and about which information can be collected. An entity is known as a first-class entity if it can exist in a system on its own and without dependence on the existence of any other first-class entity. For example, “person” and “physical address/location” might be first-class entities in given data model because both entities may be created and managed without any dependence on the existence of any other entity. In contrast, “driver's license” or “phone number” might be dependent entities because they must be associated with an existing person (or organization or device in the case of phone number) prior to their creation in the data model.
One common type of database is a relational database, where data is logically organized as a collection of tables of rows and columns. Each row, or record, in a table represents an ordered tuple of data, and each column represents an attribute of the tuple. Tables are commonly linked together in a structured fashion using one or more attributes of a row in one table as a key to access a related row of data in another table. A unique key, or candidate key, is a single column or set of columns in a database table whose values in any given row can be used to uniquely identify that row. Typically, a single unique key is selected to act as that table's primary key. Other rows in other tables (or even in the same table) can then use the values of this primary key as a reference, called a foreign key, to link back to the table and row where that primary key value is defined. Among the classes of linking relationships are one-to-one (for example, every department has only one manager, and vice versa), one-to-many (for example, a department has multiple employees, but each employee only works in a single department), and many-to-many (for example, many employees may work on a given project, and a given employee may work on multiple projects). A database management system (DBMS) may use clustering, via a clustering index, in an attempt to maintain the physical order of data on data pages in a storage object in the key order of the index when records are inserted and updated in a table.
A database may be partitioned into multiple instances, or nodes. These nodes may be logical nodes if they share the same physical hardware and/or operating system, or physical nodes if each resides on its own hardware and operating system. As defined by the database partitioning strategy, each node may have the same full set of tables, and/or only part of any given table may reside on each node. A partitioned database is common for business-intelligence-type workload and long-running analytical queries. Multiple database nodes can improve performance via parallelism, including process speed-ups for complicated queries.
A partitioned table in a database, on the other hand, is a table that has been broken into multiple partitions, or storage objects. All these partitions may reside in a single database node on the same storage system, or in multiple nodes across a partitioned database system. Known benefits of table partitioning include: (i) easy roll-in and roll-out of table data; (ii) higher table capacities; (iii) higher availability; (iv) easier administration; (v) flexible index placement; and/or (vi) higher performance/better query processing. One common type of table partitioning is range partitioning, wherein each row in a table is assigned a partition key value, and the range into which that value falls determines the partition for that row. Other types include list partitioning, hash partitioning, and composite partitioning.
Master data management (MDM) concerns itself with the management of master data. Master data is basic business data that is used across multiple systems, applications, and/or processes, and may, for example, include data about customers, products, employees, materials, suppliers, and vendors. Master data often includes slow-changing, non-transactional reference data, but is not necessarily limited to data of this type.