Cloud storage has evolved in the last decade from a model being presented as a solution to evolving data storage needs to the main storage form for many enterprises, organizations and individuals. In 2013 over 1,000 Petabytes of data were stored in the cloud, i.e. over 1,000,000,000 Gigabytes. By 2014 a single social network, Facebook™, alone stored approximately 400 Petabytes of data. Cloud storage represents a data storage model where data is stored in logical pools, the physical storage spans multiple servers and often locations, and the physical environment is typically owned and managed by a hosting company and/or service provider. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment protected and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data. Cloud storage services may be accessed through a co-located cloud computer service, a web service application programming interface (API) or by applications that utilize the API, such as cloud desktop storage, a cloud storage gateway or Web-based content management systems.
Accordingly, cloud storage is based on a highly virtualized infrastructure and is like the broader concept of cloud computing in terms of accessible interfaces, near-instant elasticity and scalability, multi-tenancy, and metered resources. Cloud storage, a form of network based storage, is made up of many distributed resources, but still acts as one (often referred to as federated storage clouds), is highly fault tolerant through redundancy and distribution of data, highly durable through the creation of versioned copies, and generally what is known as “eventually consistent” with regard to data replicas.
However, cloud storage also comes with some drawbacks and limitations in how this information is uploaded (or ingested) and how a user or users access this information subsequently in comparison to the management tools etc. that the user is typically used to using. Whilst tools such as Microsoft One Drive offer individual users functionality similar to Microsoft Explorer for managing files and integrate to software applications such as Microsoft's own Word, Excel and PowerPoint there is a lack of automated tools for managing tens, hundreds and thousands of users within enterprises and organizations. Migrating to the cloud for these is a massive undertaking.
Accordingly, it would be beneficial to provide knowledge workers, e.g. users, with a human interface, e.g. a graphical user interface on their electronic devices, to the data ingested from third-party systems that presents the data organized in the original folder contexts and determines what folder locations each knowledge worker will see in the interface.
It would be further beneficial for organizations, enterprises, and knowledge workers to have access to tools for the incremental ingestion of changes from a data source to a cloud storage repository. It would be further beneficial for the method of determining what needs to be written to the cloud storage repository from the data sources to b centralized.
In many instances user activities with cloud storage can result in the millions, hundreds of millions and even billions of items. Accordingly, it would be beneficial to provide knowledge works with a means for efficient querying and maintenance of statistics for large data sets. It would be further beneficial for this to be a responsive and computationally light method of refreshing statistics and obtaining query results for policies that evaluate one or more clauses against a cloud storage repository and provide statistics on their resulting data set.
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.