Business intelligence refers to techniques for identifying, processing, and analyzing business data. Business intelligence systems can provide historical, current, and predictive views of business operations. Business data, generated during the course of business operations, including data generated from business processes and the additional data created by employees and customers, may be structured, semi-structured, or unstructured depending on the context and knowledge surrounding the data. In many cases, data generated from business processes is structured, whereas data generated from customer interactions with the business is semi-structured or unstructured. Due to the amount of data generally generated during the course of business operations, business intelligence systems are commonly built on top of and utilize a data warehouse.
Data warehouses are utilized to store, analyze, and report data; for example, business data. Data warehouses utilize databases to store, analyze, and harness the data in a productive and cost-effective manner. A variety of databases are commonly utilized, such as a relational database management system (RDBMS), such as the Oracle Database from the Oracle Corporation of Santa Clara, Calif., or a massively parallel processing analytical database, such as Teradata from the Teradata Corporation of Miamisburg, Ohio. Business intelligence (BI) and analytical tools, such as SAS from SAS Institute, Inc. of Cary, N.C., are used to access the data stored in the database and provide an interface for developers to generate reports, manage and mine the stored data, perform statistical analysis, business planning, forecasting, and other business functions. Most reports created using BI took are created by database administrators, and the underlying database may be tuned for the expected access patterns. A database administrator may index, pre-aggregate or restrict access to specific relations, allow ad-hoc reporting and exploration.
Online transaction processing (OLTP) systems are designed to facilitate and manage transaction-based applications. OTLP may refer to a variety of transactions such a database management system transactions, business, or commercial transactions. OLTP systems typically have low latency response to user requests.
Online analytical processing (OLAP), a modification of OLTP, is an approach to answering multidimensional analytical queries. OLAP tools enable users to analyze multidimensional data utilizing three basic analytical operations: consolidation (aggregating data), drill-down (navigating details of data), and slice and dice (take specific sets of data and view from multiple viewpoints). The basis for any OLAP system is an OLAP cube. An OLAP cube is a data structure allowing for fast analysis of data with the capability of manipulating and analyzing data from multiple perspectives. OLAP cubes typically are composed of numeric facts, called measures, categorized by dimensions. These facts and measures are commonly created from a star schema or a snowflake schema of tables in a RDBMS.
A snowflake schema is an arrangement of tables in a RDMBS, with a central fact table connected to one or more dimension tables. The dimension tables in a snowflake schema are normalized into multiple related tables—for a complex schema there will be many relationships between the dimension tables, resulting in a schema which looks like a snowflake. A star schema is a specific form of a snowflake schema having a fact table referencing one or more dimension tables. However, in a star schema, the dimensions are normalized into a single table—the fact table is the center and the dimension tables are the “points” of the star.
Returning to OLAP systems, measures are derived from fact tables, which are typically composed of the measurements or data of a business process. Dimensions are derived from the dimension tables. In other words, a measure has a set of labels, where the description of the labels is described in the corresponding dimension. Two varieties of OLAP took are commonly used: relational. OLAP (ROLAP) and multidimensional OLAP NOLAN. Both ROLAP and MOLAP are designed to allow analysis of data through the use of a multidimensional data model.
ROLAP took access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations), which summarize the data at any desired combination of dimensions. While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database. One example of a ROLAP tool is the Pentaho BI Suite from the Pentaho Corporation of Orlando, Fla.
MOLAP took differ from ROLAP took in that MOLAP took often involve the pre-computation and storage of information in an OLAP cube. Most MOLAP solutions store this data as an in-memory multidimensional array, rather than in a relational database. This pre-processing and storage of data allows for fast query performance due to optimized storage, multidimensional indexing and caching, and automated computation of higher level aggregates of the data. However, the pre-processing and storage of data has some disadvantages, such as a long processing step, especially when dealing with large volumes of data. MOLAP took traditionally have difficulty querying models with dimensions with very high cardinality or a large number of dimensions. One example of a MOLAP tool is the Cognos Powerplay system from International. Business Machines of Armonk, N.Y.
Predictive analytics encompasses a variety of statistical techniques from modeling, data mining and game theory that analyze current and historical facts to make predictions about future events. Generally, when referring to business intelligence systems, the term predictive analytics is used to mean predictive modeling, “scoring” data with predictive models, and forecasting.