International Data Corporation (IDC) a global provider of market intelligence, advisory services, and events for the information technology, telecommunications, and consumer technology markets in May 2010 provided a forecast on the size of the information storage universe. According to the IDC study, information storage in the 2020s is expected to be 35 Zettabytes (i.e. 35,000,000,000,000,000,000,000 bytes), an amount nearly 44 times larger than exists today. As a consequence, solutions must be found that are well beyond the capabilities of existing storage technology in order to deal with this explosion of information.
This explosion of data storage is in part due to the creation of human generated tabular data that is typically stored in relational databases and tables or arrays, human generated unstructured data, and machine generated data which is the newest category of information. Given the speed of computation of computers, machine generated data will likely be the greatest contributor to this growth.
This machine generated data has a number of unique characteristics that do not exist in the more traditional corpra of data created by organizations. Among these unique characteristics is that the data is immutable, persistent and typically very large in size. In addition, because these machines used to create the data typically cost significant amounts of money, the value of the data is critical to the inherent business process that created it and thus the retention period of this data is typically significantly longer than more traditional forms of data.
With this growth in machine generated data the cost of storage, retrieval and analysis of the data becomes expensive and prohibitive using traditional data storage architectures.
For example, our military and homeland defenders are in the midst of a transformation that will increasingly rely upon speed, mobility and information to find, confront and defeat the enemy. The rapid growth of Remotely Piloted Aircraft that carry multiple sensors are becoming critical to the mission success. The operational edge is rapidly moving to forward-deployed bases and expeditionary forces which must rely on very limited resources and infrastructure but the requirement is growing to capture, analyze and exploit massive amounts of machine generated data in this harsh environment. Current enterprise architectures cannot scale up to handle the increase in information now occurring or that is predicted for the future. This requires that new approaches to storing and accessing vast amounts of data be developed.
Similarly, multiple private industry and governmental operations also are generating huge amounts of data that require storage, retrieval, and analysis in order to be useful in the business, industrial, and governmental setting. For instance, in the Oil and Gas industry, major corporations must routinely transfer data from their exploitation platforms performing sensory surveys of potential oil fields that may exist underwater because the amount of data being captured cannot be adequately stored and process on these state-of-the-art ships.
Currently, the various business, financial and governmental organizations attempt to use a wide variety of sources (computers, sensors, data capture devices) to achieve specific operational outcomes. However, these sources produce massive amount of information, which must be transferred to a central location for further processing, analysis and storage. This approach is not scalable because the current and projected network transfer capacity is magnitudes “too small” to move the massive amount of data from the capture location to the central processing location with acceptable latency.
It is highly desirable to be able to store and exploit such data from a desired source in real time or near-real time to meet the needs of the user. However, it is of equal importance to be able to move this archive of data and information to a different location so that analysts can have use of that data for their ongoing tasks.