Business Intelligence (BI) generally refers to software tools used to improve decision-making. These tools are commonly applied to financial, human resource, marketing, sales, customer and supplier analyses. More specifically, these tools can include: reporting and analysis tools to present information, content delivery infrastructure systems for delivery and management of reports and analytics, data warehousing systems for cleansing and consolidating information from disparate sources, and data management systems to collect, store, and manage raw data.
A subset of business intelligence tools are report generators, On-Line Analytic Processing (OLAP) tools, Enterprise Information Management (EIM) tools, Extract, Transform and Load (ETL) tools, analytics, and the like. There are a number of commercially available products to produce reports from stored data. For instance, Business Objects Americas of San Jose, Calif., sells a number of widely used report generation products, including Crystal Reports™, Business Objects Voyager™, Business Objects Web Intelligence™, and Business Objects Enterprise™. As used herein, the term report refers to information automatically retrieved (i.e., in response to computer executable instructions) from a data source (e.g., a database, a data warehouse, a plurality of reports, and the like), where the information is structured in accordance with a report schema that specifies the form in which the information should be presented. A non-report is an electronic document that is constructed without the automatic retrieval of information from a data source. Examples of non-report electronic documents include typical business application documents, such as a word processor document, a presentation document, and the like.
A report document specifies how to access data and format it. A report document where the content does not include external data, either saved within the report or accessed live, is a template document for a report rather than a report document. Unlike other non-report documents that may optionally import external data within a document, a report document by design is primarily a medium for accessing and, formatting, transforming and or presenting external data.
A report is specifically designed to facilitate working with external data sources. In addition to information regarding external data source connection drivers, the report may specify advanced filtering of data, information for combining data from different external data sources, information for updating join structures and relationships in report data, and instructions including logic to support a more complex internal data model (that may include additional constraints, relationships, and metadata).
In contrast to a spreadsheet type application, a report generation tool is generally not limited to a table structure but can support a range of structures, such as sections, cross-tables, synchronized tables, sub-reports, hybrid charts, and the like. A report design tool is designed primarily to support imported external data, whereas a spreadsheet application equally facilitates manually entered data and imported data. In both cases, a spreadsheet application applies a spatial logic that is based on the table cell layout within the spreadsheet in order to interpret data and perform calculations on the data. In contrast, a report design tool is not limited to logic that is based on the display of the data, but rather can interpret the data and perform calculations based on the original (or a redefined) data structure and meaning of the imported data. The report may also interpret the data and perform calculations based on pre-existing relationships between elements of imported data. Spreadsheet applications generally work within a looping calculation model, whereas report generation tools may support a range of calculation models. Although there may be an overlap in the function of a spreadsheet document and a report document, the applications used to generate these documents contain instructions with expressly different assumptions concerning the existence of an external data source and different logical approaches to interpreting and manipulating imported data.
A data source is an information resource. Data sources include sources of data that enable data storage and retrieval. Data sources may include databases, such as, relational, transactional, hierarchical, multi-dimensional (e.g., OLAP), object oriented databases, and the like. Further data sources may include tabular data (e.g., spreadsheets, delimited text files), data tagged with a markup language (e.g., XML data), transactional data, unstructured data (e.g., text files, screen scrapings), hierarchical data (e.g., data in a file system, XML data), files, a plurality of reports, and any other data source accessible through an established protocol, such as, Open DataBase Connectivity (ODBC) and the like. Data sources may also include a data source where the data is not stored like data streams, broadcast data, and the like.
Hierarchal data is data organized into a tree-like structure. Hierarchal data is structured using parent child relationships. A schema defines the valid parent child relations. Each node in the structure can be a parent, a child or both with the exception of a root node that may only be a parent. Hierarchal data is defined by its schema or implied schema. Examples of hierarchical data sources include hierarchical databases, folder and file systems, Web Services (WS), web feed formats including Really Simple Syndication (RSS), and eXtensible Markup Language (XML) files and schemas.
In the context of a business intelligence application, it is desirable to extract information from text. The information extracted from the text adds a significant value to the business intelligence application.
Prior art systems for achieving such a purpose usually perform an intensive standalone analysis of the text to extract information. The analysis is usually time consuming, which is a material drawback when the application has to be integrated in an environment having limited resources. Moreover, because of the nature of the analysis, such systems need to be highly customized in order to be efficient. This is a drawback since this limits the scope of an application and therefore requires a lot of resources for setting up, operating and maintaining a customized system. Even when such systems are highly customized, much pertinent information is missed, thereby limiting the efficiency of the system.
Accordingly, it would be desirable to provide techniques to overcome at least one of the above-identified drawbacks.