Data aggregation relates to many different computing fields including data analysis, reporting, data mining, data integration, and so forth. A definition of data aggregation may include any process in which information is gathered, arranged, and/or expressed in summary form for purposes such as statistical analysis. In one example relating to data aggregation, the detailed activities of a business may be converted by data aggregation into a high-level view of the business, which can then be used by business managers and non-technical users for evaluation purposes. In another example, data aggregation may be utilized by a website to collect data relating to searches performed on the website. The resultant data may then provide information regarding the most frequently requested search terms submitted to the website or even information regarding frequent users of the website.
There are many purposes for data aggregation. One specific purpose of data aggregation is to acquire information about particular groups, based on specific variables, such as age, profession, or income. Such information may be used to choose content and advertising likely to appeal to an individual belonging to one or more groups for which data has been collected. Another purpose of data aggregation may be to provide a generalized view of detailed data, which can be used to facilitate all manner of business analysis and strategizing.
A star schema is one method of performing data aggregation. Specifically, star schema is a database design that may be used for modeling data warehouses to enable analytical querying of numeric data. The star schema may relate to two types of data: dimensional data, which describes how data is commonly aggregated, and fact or event data, which describes individual transactions. Accordingly, the star schema may comprise two types of database tables (e.g., facts tables and dimension tables). Facts tables contain aggregated numeric and additive values, generally referred to as measurements. Dimension tables contain values for different perspectives by which the measurements can be interpreted.
In one example relating to the star schema, a facts table may contain columns that store aggregated sales amounts of a company. Additionally, associated dimension tables may contain columns that may be used to query the sales amounts by time, location, and customer. In accordance with this example, a star schema may comprise one facts table (i.e., aggregated sales amounts) and three dimension tables (i.e., time, location, and customer). Each dimension table may contain detailed information about one dimension (e.g., city, state, and country name fields in the location table). Additionally, the facts table presented in this example may contain three foreign key columns (one for each dimension table's primary key), and one or more columns for storing numeric measurements (e.g., the sales amount). A primary key may be an attribute or group of attributes that uniquely identifies a tuple (i.e., a set of attribute values pertaining to a given item in a database) within a database or table. For example, a client table might have a client number as its primary key. A foreign key, on the other hand, may be a table field that is not a key in its current table, but is a primary key in another database table
The design of a star schema may require knowledge of data relationships and underlying semantics, which may be obtained by analysis of the data. Accordingly, the star schema design may be resource intensive. In fact, some tasks related to determining a star schema may utilize manual analysis or human input to explain data semantics and relationships. It may be difficult to determine which data fields can be placed into the facts or dimension tables and to determine whether there exist any hierarchical or functional relationships among the fields that are chosen for dimension tables. Additionally, even aspects that do not utilize human input may require specific tools and applications that assume the existence of semantic and relational knowledge about the data.