Field of the Invention
The present invention relates to information handling systems. More specifically, embodiments of the invention relate to extraction of information for large data repositories.
Description of the Related Art
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
It is known to use information handling systems to collect and store large amounts of data. However, a mismatch exists with respect to technologies to collect and store data, vs. available technologies and capabilities to extract useful information from large data within a reasonable amount of time. It is known to deploy technologies such as Hadoop and HDFS for large and unstructured data storage across various industries. Many technologies are being developed to process large data sets (often referred to as “big data”, and defined as an amount of data that is larger than what can be copied in its entirety from the storage location to another computing device for processing within time limits acceptable for timely operation of an application using the data), however, the ability to collect and store data often outpaces the ability to process all of the data.
Most known Big Data technologies focus on how to process and analyze all data within a large data repository. This approach is bound to become inefficient or might even fail for practical applications because data volumes can and usually will grow at a very fast rate while the information contained in the data will not.