With rapid development and widespread utilization of computer technologies in the last few decades, large volumes of digital data are generated on a daily basis. Organizing and managing such a huge amount of data has promoted the development of database technologies. Relational database management systems (“RDBMS”), such as Oracle Database Management System, Microsoft SQL Database Management System and MySQL Database Management System, have thus been proposed and gained broad acceptance for data management. Such database management systems store data by rows of tables. Querying and retrieving data from the conventional databases oftentimes include retrieving a list of records while such records contain information that is not requested. For example, the illustrative SQL query causes a conventional database management system to read all fifty rows from a disk drive storing the rows:
select column1 from table1 where key>100 and key<151
In the illustrative SQL query, column1 is a column of a table 1, and key is another column (such as a primary key) of the table 1. While only data in column1 is requested, data in other columns of table1 is already read from a storage disk drive. Furthermore, the conventional database management systems do not store data in an ordered manner on physical disk drives. However, many types of data (such as network logs, network access data, financial transaction data, weather data, etc.) are of extremely high volume and ordered by time. Accordingly, there is a need for a highly parallel and efficient database system that is optimized for managing large volumes of time based data. There is a further need for a highly parallel and efficient database system for storing data by columns for faster and more efficient data retrieval.
Conventional database management systems typically generate a large number of indexes for data. Such indexes logically identify rows (also referred to herein as records). Rows of data within a table are stored on disk drives. Related rows, such as rows of a particular order by time, are usually not consecutive stored on disk drives. Rows could also be related by other factors. Retrieving a set of related records thus involves multiple disk reads of data dispersed at different locations on a disk drive. Accordingly, there is a need for a highly parallel and efficient database system for storing related data consecutively or nearby on a disk drive to reduce the number of disk reads in serving a data request, and providing an efficient structure for locating such data on a disk drive. There is a further need for the new database management system to load the structure in memory for higher performance in locating data on disk drives.
To improve data retrieval performance, conventional database management systems take advantage of high end hardware platforms, such as a computer with multiple sockets and a large amount of memory. Each of the sockets includes one or more processing units (also interchangeably referred to herein as cores). A processing unit housed in one socket can access resources (such as disk drives and memory) local to another socket. Such cross socket access incurs a performance penalty due to latency and bandwidth limitations of the cross-socket interconnect. Accordingly, there is a need for a highly parallel and efficient database management system that improves performance by avoiding the cross socket boundary access. The present disclosure incorporates novel solutions to overcome the above mentioned shortcomings of conventional database management systems.
Objects of the Disclosed System, Method, and Apparatus
Accordingly, it is an object of this disclosure to provide a parallel database management system optimized for managing large volumes of time based data.
Another object of this disclosure is to provide a parallel database management system with silo systems that utilize only local resources for faster performance.
Another object of this disclosure is to provide a parallel database management system with silo systems that utilize only local resources to avoid latency and bandwidth limitations inherent in interconnect access.
Another object of this disclosure is to provide a parallel database management system with silo systems that utilize only local memory and local disk drives for faster performance.
Another object of this disclosure is to provide a parallel database management system with a signal rich manifest describing physical location of data stored on a disk drive for locating the maximum amount of data while taking the least amount of memory and disk space.
Another object of this disclosure is to provide a parallel database management system with a hierarchical manifest describing physical location of data stored on a disk drive for faster data retrieval from a storage disk by direct reads.
Another object of this disclosure is to provide a parallel database management system with a manifest for each segment.
Another object of this disclosure is to provide a parallel database management system with a manifest stored in each segment.
Another object of this disclosure is to provide a parallel database management system with a manifest stored at end of each segment.
Another object of this disclosure is to provide a parallel database management system with a hierarchical manifest in a physically backed memory region for faster access minimizing page faults.
Another object of this disclosure is to provide a parallel database management system with a hierarchical manifest organizing data by cluster keys.
Another object of this disclosure is to provide a parallel database management system with a hierarchical manifest organizing data by time based data buckets for each cluster key.
Another object of this disclosure is to provide a parallel database management system with a hierarchical manifest organizing data by time based data buckets of equal time frames.
Another object of this disclosure is to provide a parallel database management system with time based data stored on disk drives based on the order of time stamps of the data.
Another object of this disclosure is to provide a parallel database management system with time based data of different time periods stored in different segment groups.
Another object of this disclosure is to provide a parallel database management system with data records stored by columns for faster performance in retrieving data from physical disk drives.
Another object of this disclosure is to provide a parallel database management system with data records stored by columns for reducing reads of physical disk drives in data retrieval.
Another object of this disclosure is to provide a parallel database management system with data records stored by columns in coding blocks of different coding lines on a segment to allow fewer reads in data retrieval.
Another object of this disclosure is to provide a parallel database management system to store data records by columns in a segment with a manifest indicating the location of the data on the physical disk drive of the segment for faster data retrieval.
Another object of this disclosure is to provide a parallel database management system to store a data record along with a confidence about the accuracy of the data record.
Another object of this disclosure is to provide a parallel database management system that prioritizes analytical calculations on large datasets.
Another object of this disclosure is to provide a parallel database management system that prioritizes analytical calculations on large datasets based on characteristics of the analytical calculations and characteristics of the dataset.
Another object of this disclosure is to provide a parallel database management system that prioritizes an analytical calculation based the rank of a similar analytical calculation based on characteristics of the two analytical calculations.
Other advantages of this disclosure will be clear to a person of ordinary skill in the art. It should be understood, however, that a system or method could practice the disclosure while not achieving all of the enumerated advantages, and that the protected disclosure is defined by the claims.