With the incredible number of devices being interconnected all of these interactions generate enormous amounts of data that needs to be stored—somewhere. And this does not include the enormous amount of data created by device users, shoppers, consumers, producers, etc. all of which also needs to be stored, again—somewhere. But beyond simple storage, there is also demand for security, redundancy, fast access, and reliability to stored data. There are many options for implementing a “back end” and two well-known free and hence popular implementations are based on Ceph and Hadoop® technology. These two platforms will be used as exemplary environments in which various aspects of inventive concepts disclosed in the detailed description may be practiced. It is assumed the reader is familiar with implementing both Ceph and Hadoop®, see for example Internet Uniform Resource Locators (URLs) ceph.com and Hadoop®.apache.org, and that the reader understands how data is stored, distributed, and validated for correctness.
As will be appreciated, because “Big Data” typically uses many systems and storage environments distributed across various networks, there may be a lot of overhead in scrubbing (e.g. validating) data that is being stored. This overhead can be particularly substantial, for example, when disaggregated storage such as Nonvolatile Memory Express over Fabrics (NVMe-oF) storage targets/disks are used.