A portion of the disclosure of this document may contain command formats and other computer language listings, which are subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of this document or the disclosure itself, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. EMC and PIVOTAL are registered trademarks of the respective companies in the US and other countries.
Computer systems are constantly improving in terms of speed, reliability, and processing capability. As generally known, computer systems process and store large amounts of data in communication with a shared data storage system where the data is stored. The data storage system may include one or more storage devices, usually of a fairly robust nature and useful for storage spanning various temporal requirements, e.g., disk drives.
Data may be hosted in a data lake, which is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data and therefore is very complex in nature, and typically consists of structured data, unstructured data and a combination thereof. Companies that sell data storage systems and perform analytics on data lakes and the like are concerned about providing customers with efficient information on the data stored in such huge data storages, and doing so at an optimal cost benefit to the clients and the service providers.