An increasing amount of data is stored and communicated in electronic format. In many cases, data may exist only in electronic form, making access and security considerations for such data important—inasmuch as the data may not be readily accessed or protected in any other manner.
Combined human activity creates two-and-a-half quintillion (2.5E18) bytes of electronic data every day. 90% of all electronic data has been created in just the last two years. Data sets are rapidly growing in size due, at least in part, to numerous inexpensive information-sensing devices, cameras, microphones, radio-frequency identification (RFID) readers, wireless sensor networks, and the like. At the same time, there has been a dramatic increase in posts to social media, digital pictures and videos, business documents, software logging, and the like.
Concurrent with the growth of newly generated data, the world's per-capita capacity to store digital information has doubled every 40 months since the 1980s. Developed economies increasingly make use of data-intensive technologies. There are 4.6 billion mobile-phone subscriptions worldwide and 1.5 billion people accessing the Internet. Between 1990 and 2005, more than 1 billion people worldwide entered the middle class, which means there will be increased data growth as more people become more educated and more engaged with information technologies.
The world's total effective capacity to communicate information through information networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, and 65 exabytes in 2007. It is predicted that the amount of data traffic communicated over the Internet on an annual basis will exceed 667 exabytes after 2014. Approximately one third of total stored data is in the form of alphanumeric text and still image data, which are the preferred data formats for most user applications.
In view of the growth trend toward increasingly large and complex data sets, conventional data management and data processing systems and methods are strained and, in some cases, unequal to the task. Challenges include analysis, capture, curation, search, sharing, security, storage, transfer, visualization, and information privacy. Electronic data can be described as generally having the following characteristics:
Volume: The quantity of data generated is important. The size of a data set can determine the value and potential utility of the subject data.
Variety: The category to which data belongs is a factor that helps people who are using the data and are associated with it to employ data to their advantage.
Velocity: The speed of generation of new data or how fast the data is generated and processed to meet the demands and challenges of growth and development.
Variability: The inconsistency that can be shown within the data, thus impairing effective management and use the data.
Veracity: The quality and precision of the data being captured can vary greatly. Accurate analysis depends on accuracy of the source data.
Complexity: Management (or even awareness) of intrinsic value or correlations within a data set can become a difficult issue to address, especially when large volumes of data come from multiple sources.
Problems posed by the growth trend toward increasingly large and complex data sets are not going away and will only become greater in the future. A challenge for large enterprises is determining how to implement data initiatives that straddle an entire organization while optimizing the above-described characteristics and other data management, data processing, and data communication considerations.