Business Intelligence generally refers to a category of software systems and applications used to improve business enterprise decision-making and governance. These software tools provide techniques for analyzing and leveraging enterprise applications and data. They are commonly applied to financial, human resource, marketing, sales, service provision, customer, and supplier analyses. More specifically, Business Intelligence tools can include reporting and analysis tools to analyze, forecast and present information, content delivery infrastructure systems to deliver, store and manage reports and analytics, data warehousing systems to cleanse and consolidate information from disparate sources, integration tools to analyze and generate work lows based on enterprise systems, database management systems to organize, store, retrieve and manage data in databases, such as relational, Online Transaction Processing (“OLTP”) and Online Analytic Processing (“OLAP”) databases, and performance management applications to provide business metrics, dashboards, and scorecards, as well as best-practice analysis techniques for gaining business insights.
In many organizations, data is stored in multiple formats and data sources that are not readily compatible. Data sources include sources of data that enable data storage and retrieval. Data sources may include databases, such as, relational, transactional, hierarchical, multidimensional (e.g., OLAP), object oriented databases, and the like. Further data sources may include tabular data (e.g., spreadsheets, delimited text files), data tagged with a markup language (e.g., XML data), transactional data, unstructured data (e.g., text files, screen scrapings), hierarchical data (e.g., data in a file system, XML data), files, a plurality of reports and any other data source accessible through an established protocol, such as, Open DataBase Connectivity (“ODBC”) and the like. Data sources may also include a data source where the data is not stored like data streams, broadcast data, and the like. Data sources are comprised of values and objects, such as dimensions, columns, rows, attributes, measures and the like, otherwise referred to as data model objects.
Because of the complexities of organizational data and their underlying data sources, it is advantageous to work with data within a semantic context. This can be accomplished by using a level of semantic abstraction that provides terms and abstract logic associated with the underlying data in order to manage, manipulate and analyze the data. A semantic layer or domain is the term for a level of abstraction based on a relational, OLAP, or other data source or a combination of more than one data sources or existing semantic layers. The semantic layer includes data model objects that describe the underlying data sources and define dimensions, attributes and measures that can be applied to the underlying data sources. The semantic layer may also include data foundation metadata that describes a connection to, structure for, and aspects of the underlying data sources.
A semantic layer can be used as a level of abstraction to combine partial data sets from any number of original data sources. A semantic layer can also be used to provide logical sets to which data can be associated so that data from a wide number of sources can be meaningfully aggregated. Metadata concerning the data, such as a value for data freshness, can also be associated with the data within the logic of a semantic domain. Semantic domain technology is disclosed in the following commonly-owned U.S. Pat. Nos. 5,555,403; 6,247,008; 6,578,027; and 7,181,435, which are incorporated herein by reference.
Typically, a data model object is assigned a common business term such that the user does not need to understand the specific logic of the underlying data source but can work with familiar terminology when constructing queries or otherwise accessing the data. Examples of common business terms include customer, employee, product line, revenue, profit, attrition, fiscal year, quarter, and the like.
For example, organizational data for a retail institution may be distributed among an OLTP database for storing sales transactions, a relational database for storing data pertaining to customers, an OLAP database for storing financial data according to geographical regions, time period, and products, and various spreadsheets storing sales performance figures for each member of the sales team. The data may be stored as various data objects spread among the different data sources, for example, the OLTP data source may store data objects such as “quantities sold” and “products sold,” the relational database may store data objects such as “customer names” and “customer addresses,” and the OLAP database may store data objects such as “revenues per region,” “revenues per quarter,” and so on. Retrieving data for analysis may therefore require multiple queries to multiple data sources.
There are a number of commercially available tools that can retrieve data from multiple data sources automatically. These tools can also integrate the data into a single “report” to facilitate analysis of the retrieved data. For example, Business Objects™ of San Jose, Calif., sells a number of widely used report generation tools, including Crystal Reports™, Business Objects Of AP Intelligence™, Business Objects Voyager™, Business Objects Web Intelligence™, and Business Objects Enterprise™.
As used herein, the term report refers to information automatically retrieved (i.e., in response to computer executable instructions) from a data source (e.g., a database, a data warehouse, a plurality of reports, and the like), where the information is structured in accordance with a report schema that specifies the form in which the information should be presented. A non-report is an electronic document that is constructed without the automatic retrieval of information from a data source. Examples of non-report electronic documents include typical business application documents, such as a word processor document, a presentation document, and the like.
A report document specifies how to access data and format it. A report document where the content does not include external data, either saved within the report or accessed live, is a template document for a report rather than a report document. Unlike other non-report documents that may optionally import external data within a document, a report document by design is primarily a medium for accessing and formatting, transforming or presenting external data.
A report is specifically designed to facilitate working with external data sources. In addition to information regarding external data source connection drivers, the report may specify advanced filtering of data, information for combining data from different external data sources, information for updating join structures and relationships in report data, and logic to support a more complex internal data model (that may include additional constraints, relationships, and metadata).
In contrast to a spreadsheet, a report is generally not limited to a table structure but can support a range of structures, such as sections, cross-tables, synchronized tables, sub-reports, hybrid charts, and the like. A report is designed primarily to support imported external data, whereas a spreadsheet equally facilitates manually entered data and imported data. In both cases, a spreadsheet applies a spatial logic that is based on the table cell layout within the spreadsheet in order to interpret data and perform calculations on the data. In contrast, a report is not limited to logic that is based on the display of the data, but rather can interpret the data and perform calculations based on the original (or a redefined) data structure and meaning of the imported data. The report may also interpret the data and perform calculations based on pre-existing relationships between elements of imported data. Spreadsheets generally work within a looping calculation model, whereas a report may support a range of calculation models. Although there may be an overlap in the function of a spreadsheet document and a report document, these documents express different assumptions concerning the existence of an external data source and different logical approaches to interpreting and manipulating imported data.
Reports may be large because of the amount of data retrieved and the number of computations required for presenting the data to a user. Using the retail institution example above, consider a regional manager trying to access sales performance for a given region. The regional manager may have to periodically generate a report with sales revenue for various stores within the region. The report may list customer information, sales associate information, store information, region information, and quantities sold for various products and brands, and the like.
In presenting the information to the regional manager, the report must include references to various data objects and their associated queries, filters, presentation formats, and so on. If the regional manager desires to generate a report listing only a subset of these data objects, for example, listing only store information, region information, and sales revenue per region, references to the other data objects may still be included in the report unnecessarily. These unused references increase the size and complexity of the report. Managing the reporting needs of a business organization can therefore be unnecessarily more time consuming and computationally intensive than actually required.
Accordingly, it would be desirable to provide techniques to remove unnecessary data object references from reports. In particular, it would be desirable to provide techniques that reduce the size of reports, the complexity of data computations, and the amount of data retrieved from different data sources by stripping unused references and unused data objects from the reports.