High-performance analytic solutions involve co-locating data and analytic code. This can reduce I/O overhead because large amounts of data can be loaded into memory across a grid of nodes and parallel computing can take place. One technique for the co-location of data and analytic code is referred to as the in-database model. Another technique is referred to as the outside-the-database model.
With the in-database model, analytic code executes on the nodes of a distributed database system in a shared-nothing environment. The process usually commences by calling from a SQL query a user-defined function (UDF) that was pre-installed on the database management system. The data are either already local to the nodes or are moved to the nodes as requested by the SQL query.
The in-database model is a SQL-centric, shared-nothing environment in which UDFs execute under the control of the database management system. That is, nodes cannot communicate with each other, typically information cannot persist between queries on the database nodes unless it is written in the form of tables to the database management system, and the database management system controls resources consumed by the UDF. In this model, failover, replication, and support for transactions can be provided by the database.
With the outside-the-database model, a gridded computing environment is employed where data resides in memory on the compute nodes, and the analytic code, instead of the database management system, controls the entire process. Data is co-located by pre-distributing it to the grid nodes where the analytic code loads the local data into memory.
The outside-the-database model can be a shared-everything environment and could also be called the “without-the-database” model since there is no immediate connection to a distributed database. The data might come from, for example, a Teradata® database, but it is distributed onto the grid compute nodes prior to the analysis, and the connection with the distributed database where the data might have originated is severed. In this environment, the analytic code has full control over node-to-node communication by adding a message passing protocol.