1. The Field of the Invention
The present invention relates to the automated calculation, navigation, and display of statistical information. More specifically, the present invention relates to automated interfaces for automatically extracting information from unformatted data and for generating and displaying that information for a user in a user-selected format as a readily comprehendible graphical display.
2. The Relevant Technology
Most organizations have a need for large-scale statistical analysis to adequately compare themselves and assess their standing in relation to other peer organizations. To do this on a large scale requires not only that a large amount of data be gathered, but also that some manner of formatting the data, sorting through the data, and presenting the data in a comprehensible manner be available.
As one example, libraries, such as public libraries, school libraries and the like, may desire to know how they stand with regard to other similarly situated libraries. For instance, they may want to know how their expenditures per capita, holdings per capita, size, staff, etc. compare to similarly situated institutions. For a library to conduct such a study, the library would have to poll each library it wished to include in the comparison. Additionally, the library would have to include in the poll at the outset all statistical categories for which it wished to compare itself.
Once the data was collected, the library would then have to compile the data. Such a process is very laborious and time consuming. It is also very inflexible, because each type of statistical comparison, whether it be average holding, mean holding, population represented or size of school, and all combinations of these would have to be separately compiled for each statistical category. Furthermore, if the researcher desired to alter the composition of the control group or, as termed herein, the xe2x80x9cpeer group,xe2x80x9d all new calculations would be required.
It has been recognized by those in the relevant art that computer automation can help to meet these needs. Computer automation in the past several decades has contributed greatly to the amount of data available for answering pressing organizational questions. A large amount of data is available from many sources, including U.S. Census data, NCES library statistics, records of financial transactions in the world""s markets, etc. In the past few years there has been an incredible proliferation of data in every area. We are xe2x80x9cdrowning in data and starved for information.xe2x80x9d The big problem yet to be solved is to make the information contained in this massive amount of data comprehensible and readily accessible to those who need it.
The art has also seen the arrival of several information analysis types of programs. Two types in particular are very relevant for exploiting the value of the many large scale accumulations of data: (1) database management systems and (2) statistical analysis systems. Database management systems, such as Oracle(copyright), Sybase(copyright), Microsoft Access(copyright), etc., have been available for a number of years now. The past thirty years has also seen great advances in the development and availability of statistical analysis systems such as SAS(copyright), S-PSS(copyright), SPlus(copyright), and others. Database management systems are particularly powerful for structuring and organizing large and complex databases. Modem statistical analysis software systems provide an incredible variety of analytical methods for transforming and extracting the information in large databases.
Most database management systems have at least a minimum amount of statistical analysis capability, and statistical analysis software systems usually also have some database manipulation capability. However, the combination of the two is much more powerful in analyzing large scale databases than either alone. In the hands of skilled practitioners, the combination of a good database management system and a good statistical analysis system can create a massive amount of information out of a large scale database. The big problem is in organizing, communicating and interpreting these results. In an afternoon, a user skilled in the use of a powerful database management system and a state-of-the art statistical analysis system could create a room full of printed statistical results-stacks of output that would take perhaps years to interpret by usual methods.
A need exists for a way to deal with this threefold problem of organizing, communicating and interpreting statistical results. The people who have the need of statistical comparisons are frequently not trained in statistical methods. The results of statistical calculations on data are generally indecipherable by a large part of the people who need the information contained therein. To make that information accessible to such people, some manner of navigating through the raw data and presenting the data in a readily comprehensible form to unindoctrinated users is needed.
The key to large scale data analysis is multivariate visualization. That is, one needs holistic pictures of data that communicate information about many variables simultaneously. A number of holistic graphing methods have been developed in recent years, some by the authors of this patent application, based upon multivariate statistical methods. Most of these are in the public domain, and some have been incorporated into commercial packages such as DataDesk(r), McSpin(r), and SAS""s INSIGHT(r) and JMP(r). These programs, however, have limited data searching capabilities, and are not capable of calculating and manipulating higher order interactions among variables in truly large data sets. Such programs are also relatively inflexible, quite complex, and generally require the operators of the programs to be highly trained in statistics in order to properly operate the programs to glean useful information from data being operated upon.
The intent of the present invention is to provide a structure and an architecture for bringing a wide variety of graphical and statistical methods together. A need exists to incorporate the incredible data organizing and data analyzing capabilities of database management systems and statistical analysis systems with a wide variety of methods for presenting both the holistic and also the specific properties of data.
In filling this need, a navigation component is needed that combines the best of the multivariate visualization techniques with a variety of traditional graphs and a large variety of statistical indices to create a total analysis and data navigation system that makes high level analysis available to users untrained in statistical methodology.
Such a data gathering and extracting process is needed that can also be used in real time, such as in manufacturing processes, to sort through large amounts of data generated about the on-going processes of the manufacturing plant.
Furthermore, it would also be advantageous to provide a service in connection with a system for analyzing data in which the data is provided in a properly formatted manner and periodically updated by the vendor.
The apparatus of the present invention has been developed in response to the present state of the art, and in particular, in response to the problems and needs in the art that have not yet been fully solved by currently available technology. Thus, it is an overall objective of the present invention to provide an improved automated data generation, navigation, and analysis system that overcomes the problems and shortcomings discussed above.
To achieve the foregoing objects, and in accordance with the invention as embodied and broadly described herein in the preferred embodiment, a Statistical Comparator Interface is provided. The Statistical Comparator Interface of the present invention is a large-scale system for complex data navigation and analysis. Additionally, the Statistical Comparator Interface is adaptable for use to analyze both numerical and nonnumerical data. The Statistical Comparator Interface also preferably provides an efficient way of structuring data and complex transformations of that data within memory storage in order to enable optimal interpretation of the entire complex of data structure.
The present invention also encompasses a method, embodied in a set of processes for creating this unique and useful data structure from a broad variety of data input types and for navigating the data structure. Finally, the present invention also encompasses an apparatus. The Statistical Comparator Interface can convert an ordinary digital computer into a high level holistic monitoring device for tracking and ordering complex data sets such as might be seen in manufacturing systems.
The Statistical Comparator Interface provides a user with the capability to reduce vast amounts of raw data into a readily navigable and quickly comprehendible data architecture. In a presently preferred embodiment, the Statistical Comparator Interface is embodied in two basic stages. In a data compilation stage, a compilation module, which is preferably implemented with computer software, receives the raw data and compiles the raw data into a data architecture format that is readily navigable by a navigation engine.
The data architecture preferably comprises a multidimensional matrix. The data architecture matrix in one embodiment is organized into blocks arranged with columns and rows. Each block may, likewise, comprise columns and rows (fields). An initial block may contain the raw data arranged in fields. Additional blocks may reflect statistical manipulations of the raw data, including quantitative data, composite data, and internal field structure coefficients. Indices, including higher order indices (indices of indices) may also be included to facilitate rapid navigation of the data architecture matrix.
In a navigation stage, the navigation engine is employed to allow a user to generate data profiles from among the previously raw data, including selecting one or more peer groups with selected characteristics and selecting raw data and statistical derivations in a palette of categories to be calculated and displayed for each selected peer group.
The navigation stage may also be configured to allow the user to select from among a plurality of graphical depiction schemes to display the selected statistical comparisons in a manner that is readily perceivable to a lay, unindoctrinated user. The graphical depiction schemes preferably employ holistic multivariate analysis techniques such as multivariate analysis of variance (MANOVA) and multivariate multiple regression to create a variety of univariate and multivariate graphs to display the highly complex data in a readily understandable, two dimensional or three dimensional format.
These and other objects, features, and advantages of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.