The present invention relates to information processing and more particularly to query and reporting systems and methods related to information processing.
Organizations have always been large collectors of data. During the past two decades, the sharply increasing popularity and advancement of personal computers have introduced many new options and compelling opportunities for amassing great stores of data. Notwithstanding, much of the value in this captured data lies in the ability to retrieve and review it in certain ad hoc selections and presentations in order to discover information contained in the data.
Historically, institutions have generated countless reports an extracts of data to alleviate this problem. In most instances, programmers who understand the systems used to capture data develop these reports and extracts. However, developing suitable extraction and reporting tools for generating ad hoc queries against available data frequently has required too much time and human capital, significantly limiting the usefulness of the data.
Legacy systems have significantly contributed to the problem. In the 1970s, mainframe computers comprised the center of most database systems development. The 1980s brought new mini-computer platforms. The late eighties and early nineties brought the client-server architecture running on personal computer hardware and popular server platforms such as Unix, NetWare and Windows. Despite these changes in platforms, architectures, tools, and technologies, large amounts of data and large numbers of data analysis applications continue to reside in the mainframes of the 1970s. By some estimates, more than 70 percent of existing institutional data still resides on mainframes.
To cope with this problem, several significant technological advances have been made. First, the advent of more powerful personal computers and operating systems has resulted in an explosion of data storage in standardized data stores or databases and desktop tools to manipulate this data. Second, technological advances have resulted in newer enterprise-class applications to manage data and an understanding of the fundamental differences between transactional and analytical systems. These advances have given rise to the “data warehouse” concept for consolidating resources around data retrieval. Third, display applications have made significant progress in providing users with easier and better access to data.
Although the foregoing technological advances have made significant progress to wards solving the problems associated with extracting useful information from data, several problems still exist. To be truly useful, information needs to be accessible and useable by a broad cross-section of information consumers. Current systems are too complex, too expensive, too rigid and too insecure to support broad access and use of information.
Prior systems are too complex to support access and use by a broad cross-section of information consumers. The bulk of the world's population simply does not have the skills necessary to use them. Most will likely never have the skills necessary to compose and submit database queries. Some have suggested that desktop applications are an answer to this problem. However, desktop database solutions typically result in fragmented data that is oriented towards very specific needs. Moreover, this approach to data management assumes the end user has the time to expend on managing the data in spreadsheets, files, and desktop databases. White some users may be proficient at data management, most undertake these tasks as a necessity. Given the choice, most users find it more efficient to focus on actual analysis.
Moreover, the solutions proffered by prior systems to address this complexity often do not work well. Many require information users to work through experts. However, many information consumers cannot provide specific, accurate requirements to a report writing expert on the first try. Thus, a typical query/report cycle involves several iterations to get the data and presentation correct and to validate the results.
Typically, prior systems offer few tools, if any, to unsophisticated information consumers for quickly and easily creating reports without programming assistance. Existing tools are limited to simplified wizard and web interfaces that simplify complexity by minimizing the number of options and inputs required of users. However, because designers of these wizards and web interfaces presume that specific types of data will be manipulated, these interfaces may be inflexible and lead to inaccurate tables or cross sections of data when applied to different types of data. They may also restrict the information user from accessing needed information.
Prior systems have often been too expensive to support broad access and use. The emergence of special data warehouses deploying online analytical processing (OLAP) and complex data models requires that data be transformed into multidimensional arrays or cubes, limiting both flexibility and timeliness of the data to some extent and requiting expensive applications and programming support. Because of such, current systems are so complex that they can only be deployed successfully in large enterprises that can afford them. Millions of small-to-midsize institutions also need access to business data for operational and decision-making purposes. While these institutions may not have formal, specialized data warehouses, they do need to perform reporting and analysis functions. Regardless of an organization's size, improving the ability to access and use data is critical to success.
Prior systems have frequently been too rigid to support broad access and use. Current systems limit user interaction with data. Users run reports typically created by experts. Users cannot generally look at fields or relations other than those included by experts. Users generally cannot drill down on data or drill through data to determine if the data makes sense.
Moreover, the various physical data layouts used in analytical systems have diminished the value of the data for many information users. Most transactional systems employ an entity-relationship (ER) data model. ER is a modeling technique that seeks to remove redundancy in data, thus increasing transactional speed. ER removes redundancy by recognizing that pieces or fields of information are related to other fields of information through one-to-one, one-to-many, and many-to-many relationships and by organizing these fields in several interrelated tables. A byproduct of ER modeling is that the relations incorporated into the data model generally embody business or logical relations. These relations are inherently understandable by humans because they are based upon the way humans manage data. Humans inherently know that customers have names, address and phone numbers, and that they place orders. Humans inherently know that orders are shipped through shippers, occur in sales regions, are taken by specific sales personnel and are for certain products.
Because ER-modeled systems result in numerous database tables, analytical systems typically do not handle them very well. As described above, transactional systems are generally modeled for data integrity and transactional speed. Typically, transactional systems only need to handle relatively small amounts of information. Most transactional systems only maintain data for a day, week, month or year. Accordingly, and because they typically utilize an ER data model, retrieving data from a transactional system often requires joining several database tables. Because transactional databases are relatively small, these joins do not require extensive processing power or memory. Analytical systems, however, are built to analyze large quantities of data. Accordingly, analytical systems contain a great deal of data. Frequently, data in analytical systems span several years or the entire life cycle of the associated institution. Because they contain extensive data, executing joins on multiple tables, typically requires a great deal of processing power and memory.
Analytical systems have attempted to overcome excessive processing and memory needs through alternative physical data modeling strategies. Most of the existing data modeling strategies require transformations of transactional data from their ER-modeled structures into more analytically friendly, flat data models. While these efforts are designed to increase processing speeds, they often result in the loss of some data and important business logic.
Because entity-relations are generally not maintained by analytical systems, existing ad hoc data query tools present flat data layouts to users. Typically, a user wishing to create a query against an analytical system is presented with a myriad of fields from which to choose. Sometimes, these fields are grouped into logical groupings. However, in each case the inherent business logic relations among the fields are typically lost. Further, once the user has selected certain fields, existing data query systems either automatically decide what joins to make on the tables associated with the fields, or present the user with a clumsy Boolean interface that itself is not aware of the relations among the selected fields. This can produce inconsistent results as fields are added or removed from a query, and makes generating a meaningful query very difficult. Missing or automatically interpreted join information can cause cross joins to be performed, among other errors, some of which are detected by the query reporting engine, while others are not, transparently returning erroneous results to the user and making report validity and data integrity a major issue for an organization.
Prior systems are also typically too insecure to support broad use. Most organizations cannot provide broad access to analytical data because of a lack of security for the data. Current systems rely on the underlying database system to provide security for the data. However, most database systems enforce security on an entire database or on tables or rows of the database. Therefore, organizations wanting to provide broad access to data are either required to expose sensitive data or restrict data access.
Thus, there is a need for a data query and reporting system that (1) allows non-technical computer users to build complex queries, (2) minimizes the need to have technical computer users build complex database queries by hand; (3) accommodates the need for a simple, easy-to-understand iterative system for generating and validating queries; (4) reduces the complexity and costs associated with retrieving useful information for organizations of all sizes, especially smaller businesses; (5) minimizes data transformations from transactional to analytical systems, preserving data and business logic, and promoting smarter queries; and (6) supports a robust security model, enabling organizations to protect sensitive data while providing broad data access.