Data processing of complex types of data, such as processing of millions to hundreds of millions of records of individuals, businesses and other entities, has historically been performed as a batch process using large mainframe computers. These large quantities of data were typically input into the processing system using a physical medium such as magnetic tape or electronic/magnetic disk. Once data processing began, the process would continue largely uninterrupted, over the course of several hours to several weeks, and output data would be provided.
For marketing campaigns such as catalogue mailings, promotional mailings and offers, a client or other system user would request the use of various databases and mailing lists, as input data, and would then be required to specify, in advance of data processing, how the data should be segmented to provide the resulting campaign list, such as a mailing list. For example, the various lists and databases may include hundreds of millions of records of individuals, while the resulting campaign would be for a mailing to 10,000 individuals who meet certain criteria, such as home ownership, previous purchasing patterns, and so on.
Similarly, in various scientific and medical research areas, such as phase three drug evaluations, huge amounts of data may be generated which must be processed to detect various statistical patterns, such as efficacy in a larger population, dosage requirements, significant side effects and interactions with other drugs. In addition, many studies are conducted in numerous locations, with data collected throughout the world. Again, vast quantities of data must be processed, and must result in a selection of individuals who meet certain criteria, such as having certain adverse reactions.
In other areas such as speech and signal processing, vast quantities of data may be collected and must be analyzed. For accurate speech recognition and speech generation, vast data stores may be generated, for thousands of analog electronic signals which must be digitized and parsed into corresponding phonemes, for thousands of words, for thousands of sentences, in any of numerous languages, each with potentially different pitch, timing and loudness (collectively, prosody), each with different co-articulations based on preceding and subsequent words and phonemes, and each from thousands of individuals. In addition, huge amounts of data to be analyzed may be collected, such as for intelligence services to analyze speech signals received from mobile communications for potentially unlawful or dangerous activities. Again, vast quantities of data must be processed, and must result in a selection of words and corresponding pronunciations that meet certain criteria, such as having a likelihood of fit to selected phoneme patterns from a plurality of different speakers of a plurality of different languages, with high discrimination and noise immunity.
Because of the batch processing environment of the computing systems required to manage such large data volumes, in the prior art all such segmentation or other selection criteria had to be specified in advance. Unfortunately, the selection criteria may not be known in advance, particularly where the determination of the selection criteria is itself dependent upon the accumulated data, such as in areas of marketing campaigns, scientific research and speech and signal processing. In addition, based upon the data results, a user may want to modify the selection criteria, and is unable to do so in prior art systems, without repeating all of the processing with the modified criteria.
Other prior art forms of real-time data analysis have largely been confined to significantly less complex data types, typically solely numerical data, such as sales and revenue data, capable of straightforward arithmetic and algebraic manipulations (e.g., sums and averages) and numerical methods of analysis (e.g., Riemann summation). Prior art data analysis systems have not succeeded at providing real-time analysis of more complicated data, particularly complex data which requires set operations and not arithmetic manipulations. For example, prior art data analysis systems have not allowed for real-time data analysis of voluminous personal attribute data for marketing campaign determination and management, to provide a resulting set of individuals or households who meet certain criteria, particularly where the criteria may be determined dynamically and interactively, in real-time.
A need remains, therefore, for a database system architecture which can process such vast amounts of complex data, in parallel and asynchronously for higher data throughput, which provides for set operations, and which allows real-time query processing for user interactivity, such as for data analysis and modifying selection criteria. Such a database system architecture should be capable of processing complicated data types, from personal attribute data to speech and signal processing data.