Digitizing information allows vast amounts of data to be stored in incredibly small amounts of space. The process, for example, permits the storage of the contents of a library to be captured on a single computer hard drive. This is possible because the data is converted into binary states that can be stored via digital encoding devices onto various types of digital storage media, such as hard drives, CD-ROM disks, and floppy disks. As digital storage technology progresses, the density of the storage devices allows substantially more data to be stored in a given amount of space, the density of the data limited mainly by physics and manufacturing processes.
With increased storage capacity, the challenges of effective data retrieval are also increased, making it paramount that the data be easily accessible. For example, the fact that a library has a book, but cannot locate it, does not help a patron who would like to read the book. Likewise, just digitizing data is not a step forward unless it can be readily accessed. This has led to the creation of data structures that facilitate in efficient data retrieval. These structures are generally known as “databases.” A database contains data in a structured format to provide efficient access to the data. Structuring the data storage permits higher efficiencies in retrieving the data than by unstructured data storage. Indexing and other organizational techniques can be applied as well. Relationships between the data can also be stored along with the data, enhancing the data's value.
In the early period of database development, a user would generally view “raw data” or data that is viewed exactly as it was entered into the database. Techniques were eventually developed to allow the data to be formatted, manipulated, and viewed in more efficient manners. This allowed, for instance, a user to apply mathematical operators to the data and even create reports. Business users could access information such as “total sales” from data in the database that contained only individual sales. User interfaces continued to be developed to further facilitate in retrieving and displaying data in a user-friendly format. Users eventually came to appreciate that different views of the data, such as total sales from individual sales, allowed them to obtain additional information from the raw data in the database. This gleaning of additional data is known as “data mining” and produces “meta data” (i.e., data about data). Data mining allows valuable additional information to be extracted from the raw data. This is especially useful in business where information can be found to explain business sales and production output, beyond results solely from the raw input data of a database.
Thus, data manipulation allows crucial information to be extracted from raw data. This manipulation of the data is possible because of the digital nature of the stored data. Vast amounts of digitized data can be viewed from different aspects substantially faster than if attempted by hand. Each new perspective of the data may enable a user to gain additional insight about the data. This is a very powerful concept that can drive businesses to success with it, or to failure without it. Trend analysis, cause and effect analysis, impact studies, and forecasting, for example, can be determined from raw data entered into a database—their value and timeliness predicated by having intuitive, user-friendly access to the digitized information.
Currently, data manipulation to increase data mining capabilities requires substantial user input and knowledge to ensure that erroneous data is not included in various data perspectives. This requires that a user must have intimate knowledge of the data and insight into what types of errors can occur in the data. Without this prior knowledge, a user must try a ‘hit and miss’ approach, hoping to catch data anomalies buried in a given data perspective. This approach is typically beyond the casual user and/or is too time consuming for an advanced user. The amount of stored data is generally too vast and complex in relationship for a user to efficiently develop a useable strategy to ensure that all data anomalies are uncovered.