1. Technical Field
The invention relates generally to graphing tools for the computer. More particularly, the invention relates to a system and method for automated graphing of trends using massive datasets.
2. Description of the Prior Art
Off-the-shelf graphing tools can be suboptimal for use by analysts dealing with real-world datasets. Real-world datasets often contain a large number of records and a large number of variables per record.
In order to produce a sensible, aesthetically pleasing graph for every variable in a dataset with an off-the-shelf graphing tool, an analyst must typically adjust and audit each graph by hand, a task that quickly becomes intractable as the size of the dataset increases.
Real-world datasets often present one or more of the following problems that typically frustrate regular graphing tools:                A high likelihood of containing instances of bad or corrupted data that could distort the graph;        Little or no documentation about the type of each variable, such as continuous, categorical, or mixed; and        The presence of arbitrarily encoded missing or special values.        
David R. Turner in Error Detection and Principle Components Analysis on a Large Semiconductor Data Set (May 2001), a manuscript received to satisfy course requirements, which work was supported by LSI Logic Corporation through a grant to Portland State University, discusses an outlier detection methodology, the Histc method, geared toward large datasets, in the context of 5 to 95 percentile filtering and outlier detection through dual variance. Turner found that given data with a large concentration of outliers at a given value, the Histc filtering preserved 62.9% of data with one replacement while the 5% to 95% filtering process preserved 13.7% of data with one replacement. Turner also discussed that one can derive and apply meta-parameters to simplify other analyses such as time series trend detection because it is believed that some of the principle components will probably be more sensitive to certain types of changes in the process.
Edward Tufte, from a few pages from the 18-page chapter on Sparklines in Beautiful Evidence (2006),
http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1 discusses sparklines, as simple, word-sized graphics and as a way to capture one or more values in context.
However, Tufte is completely silent on the problems introduced hereinabove, namely, a high likelihood of containing instances of bad or corrupted data that could distort the graph; little or no documentation about the type of each variable; and the presence of arbitrarily encoded missing or special values. Turner is completely silent on little or no documentation about the type of each variable and the presence of arbitrarily encoded missing or special values.
It would be advantageous to provide a method and apparatus that solves any of or any combination of the problems disclosed hereinabove.