It is not uncommon to see the amount of data associated with a business venture grow at an exponential pace. For example, enterprises are increasingly capturing, storing, and mining a plethora of information related to communications with their customers. Often this information is stored and indexed within databases. Once the information is indexed, queries are developed on an as-needed basis to mine the stored information to satisfy a variety of organizational goals, such as planning, analytics, and reporting.
Often, the information stored and indexed is created, mined, updated, and manipulated by application programs created by developers on behalf of analysts. These programs are referred to as user-defined functions (UDF's).
The information stored in the databases also provides enterprises with an opportunity to derive relationships or patterns from that information; the relationships and patterns can be defined by functions. These functions, when supplied certain input variables, transform input data into projected output values which the enterprises may rely upon for its business operations. Such scenarios may be useful in projecting the impact of sales given certain anticipated conditions, for example. Mathematical regression algorithms are sometimes used in this approach.
One issue with regression analysis is the large amount of information typically needed to produce meaningful and reliable results. The information may be stored across multiple rows, perhaps in a system that uses a “shared nothing” architecture, in which each node is independent and self-sufficient, and there is no single point of contention across the system. Such benefits sometimes come at a cost—the shared nothing architecture may provide relatively slow access to information stored across the system. For example, when multiple rows of data serve a single business calculation, the communication of data between nodes can be sluggish.