Data can be an abstract term. In the context of computing environments and systems, data can generally encompass all forms of information storable in a computer readable medium (e.g., memory, hard disk). Data, and in particular, one or more instances of data can also be referred to as one or more data objects. It is generally known in the art that a data object can, for example, be an actual instance of data, a class, a type, or a particular form of data, and so on.
Generally, storing and processing data are important aspects of computing systems and environments. Today, there is an ever increasing need to manage storage and processing of data in computing environments.
Data can be stored in memory. Memory is an important aspect of computing and computing systems. Memory can be volatile (e.g., Random Access Memory (RAM)) or non-volatile (e.g., flash memory). Generally, Volatile memory requires power to maintain the stored information, whereas non-volatile memory does not need power to maintain the stored information. Today, both volatile and non-volatile forms of memory are extensively used in various types and in numerous devices.
Databases provide a very good example of a computing environment or system where the storage and processing of data can be crucial. As such, to provide an example, databases are discussed below in greater detail.
The term database can also refer to a collection of data and/or data structures typically stored in a digital form. Data can be stored in a database for various reasons and to serve various entities or “users.” Generally, data stored in the database can be used by one or more “database users.” A user of a database can, for example, be a person, a database administrator, a computer application designed to interact with a database, etc. A very simple database or database system can, for example, be provided on a Personal Computer (PC) by storing data (e.g., contact information) on a Hard Disk and executing a computer program that allows access to the data. The executable computer program can be referred to as a database program, or a database management program. The executable computer program can, for example, retrieve and display data (e.g., a list of names with their phone numbers) based on a request submitted by a person (e.g., show me the phone numbers of all my friends in Ohio).
Generally, database systems are much more complex than the example noted above. In addition, databases have been evolved over the years and are used in various business and organizations (e.g., banks, retail stores, governmental agencies, universities). Today, databases can be very complex. Some databases can support several users simultaneously and allow them to make very complex queries (e.g., give me the names of all customers under the age of thirty five (35) in Ohio that have bought all the items in a given list of items in the past month and also have bought a ticket for a baseball game and purchased a baseball hat in the past 10 years).
Typically, a Database Manager (DBM) or a Database Management System (DBMS) is provided for relatively large and/or complex databases. As known in the art, a DBMS can effectively manage the database or data stored in a database, and serve as an interface for the users of the database. For example, a DBMS can be provided as an executable computer program (or software) product as is also known in the art.
It should also be noted that a database can be organized in accordance with a Data Model. Some notable Data Models include a Relational Model, an Entity-relationship model, and an Object Model. The design and maintenance of a complex database can require highly specialized knowledge and skills by database application programmers, DBMS developers/programmers, database administrators (DBAs), etc. To assist in design and maintenance of a complex database, various tools can be provided, either as part of the DBMS or as free-standing (stand-alone) software products. These tools can include specialized Database languages (e.g., Data Description Languages, Data Manipulation Languages, Query Languages). Database languages can be specific to one data model or to one DBMS type. One widely supported language is Structured Query Language (SQL) developed, by in large, for Relational Model and can combine the roles of Data Description Language, Data Manipulation Language, and a Query Language.
Today, databases have become prevalent in virtually all aspects of business and personal life. Moreover, usage of various forms of databases is likely to continue to grow even more rapidly and widely across all aspects of commerce, social and personal activities. Generally, databases and DBMS that manage them can be very large and extremely complex partly in order to support an ever increasing need to store data and analyze data. Typically, larger databases are used by larger organizations. Larger databases are supported by a relatively large amount of capacity, including computing capacity (e.g., processor and memory) to allow them to perform many tasks and/or complex tasks effectively at the same time (or in parallel). On the other hand, smaller databases systems are also available today and can be used by smaller organizations. In contrast to larger databases, smaller databases can operate with less capacity.
A currently popular type of database is the relational database with a Relational Database Management System (RDBMS), which can include relational tables (also referred to as relations) made up of rows and columns (also referred to as tuples and attributes). In a relational database, each row represents an occurrence of an entity defined by a table, with an entity, for example, being a person, place, thing, or another object about which the table includes information.
Other types of databases that are not relational at least in the traditional sense or use an Alternative Data Processing (ADP) have been more recently developed. Alternate Data Processing can generally refer to the ability to issue a request (e.g., query) against data that does not necessarily conform to the Relational Model employed by databases. Such data could, for example, be semi-structured (Key, Value pairs), pure text, encoded sensor data etc. and the processing operations conducted against it might be relational, procedural, functional, mapper or reducer based (Map Reduce is a technique that can be applied to this type of alternate data to essentially turn it into a result set form by mapping input data against some pre-determined structure and reducing the resulting output to a final set by applying a selection algorithm). Two examples of these Alternate Data Processing (ADP) environments are Aster Data and Hadoop based environment as generally known in the art, where Aster Data can combine a parallel database approach as a means to store the data with a SQL wrapped Map Reduce capability (SQL-MR) provide for ADP, and Hadoop can combine a distributed file system with a Map Reduce framework to provide for ADP.
However, despite their different uses, applications, and workload characteristics, most systems can run on a common Database Management System (DBMS) using a standard database programming language, such as Structured Query Language (SQL). Most modern DBMS implementations (e.g., Oracle, IBM DBb2, Microsoft SQL, Sybase, MySQL, PostgreSQL, Ingress, etc.) are implemented on relational databases, which are well known to those skilled in the art.
Typically, a DBMS has a client side where applications or users submit their queries and a server side that executes the queries. On the server side, most enterprises employ one or more general-purpose servers. However, although these platforms are flexible, general-purpose servers are not optimized for many enterprise database applications. In a general purpose database server, all SQL queries and transactions are eventually mapped to low level software instructions called assembly instructions, which are then executed on a general purpose microprocessor (CPU). The CPU executes the instructions, and its logic is busy as long as the operand data are available, either in the register file or on-chip cache. To extract more parallelism from the assembly code and keep the CPU pipeline busy, known CPUs attempt to predict ahead the outcome of branch instructions and execute down the code path speculatively. Execution time is reduced if the speculation is correct; the success of this speculation, however, is data dependent. Other state-of-the-art CPUs attempt to increase performance by employing simultaneous multithreading (SMT) and/or multi-core chip multiprocessing (CMP). To take advantage of these, changes have to be made at the application or DBMS source code to manually create the process/thread parallelism for the SMT or CMP CPUs. This is generally considered highly as very complex to implement and not always applicable to general purpose CPUs because it is workload dependent.
Unfortunately, general purpose CPUs are not efficient for database applications. Branch prediction is generally not accurate because database processing involves tree traversing and link list or pointer chasing that is very data dependent. Known CPUs employ the well-known instruction-flow (or Von Neumann) architecture, which uses a highly pipelined instruction flow (rather than a data-flow where operand data is pipelined) to operate on data stored in the CPUs tiny register files. Real database workloads, however, typically require processing Gigabytes to Terabytes of data, which overwhelms these tiny registers with loads and reloads. On-chip cache of a general purpose CPU is not effective since it's relatively too small for real database workloads. This requires that the database server frequently retrieve data from its relatively small memory or long latency disk storage. Accordingly, known database servers rely heavily on squeezing the utilization of their small system memory size and disk input/output (I/O) bandwidth. Those skilled in the art recognize that these bottlenecks between storage I/O, the CPU, and memory are very significant performance factors.
However, overcoming the bottlenecks between storage I/O, the CPU, and memory (noted above) can be very complex because typical database systems consist of several layers of hardware, software, etc., that influence the overall performance of the system. These layers comprise, for example, the application software, the DBMS software, operating system (OS), server processor systems, such as its CPU, memory, and disk I/O and infrastructure. Traditionally, performance has been optimized in a database system “horizontally,” i.e., within a particular layer. For example, many solutions attempt to optimize various solutions for the DBMS query processing, caching, the disk I/O, etc. These solutions employ a generic, narrow approach that still fails to truly optimize the large performance potentials of the database system, especially for relational database systems having complex read-intensive applications.
Accordingly, it would be very desirable and useful to provide improved database solutions.