Data warehousing involves the gathering, storage, and retrieval of large amounts of information. In the modern age, it is common to perform the data acquisition aspect of data warehousing in real-time, where streaming information may be placed into databases as it arrives. This is especially useful in areas such as manufacturing, where sensor information from various machines may arrive simultaneously and at a high rate of speed.
Simply acquiring the data in real-time, however, is only part of the equation. It is also beneficial to have the data stored in a way that makes it easy to mine. Relational databases are often used for storage, but the organization of the data in the relational database can be critical to efficient mining later. It therefore becomes a priority to not only store the data in real-time, but to store it in a proper format for easy mining.
Additionally, relational databases are typically not used for real-time data analysis. The delay in actual storage times into the relational database, and the complexity of the storage design, typically makes it impractical for real-time analysis of incoming data. In the manufacturing world, however, real-time data analysis can be critical in keeping the production line as efficient as possible. For example, in a microchip fabrication plant, it would be beneficial to have a “feedback loop”-type system where information from sensors examining the production of one portion of the chip may be used in real-time to modify or delay the production of another portion of the chip (or another chip entirely). This allows a manufacturer to correct for deficiencies that might have otherwise resulted in an expensive loss. The feedback loop would aid manufacturers in getting their yield up.
Unfortunately, this problem has proved difficult to solve, as the vast amount of data gathered in these types of systems in a short amount of time make scaling of any solution a major impediment.
Previous solutions have attempted to solve the problems mentioned above by utilizing a name-value schema to quickly store real-time information into persisted memory. While such solutions are indeed able to achieve very fast save rates, queries to the data are very slow, making them unusable for the real-time data analysis described above.
What is needed is a solution that overcomes these deficiencies.