Cloud computing environments continue to gain in popularity with numerous types of consumers ranging from global companies to individuals. Storage is a central element in the cloud computing environment. However, storage is also one of the characteristics that makes optimization difficult due to the variety of applications that each have unique performance needs. Applications running in the cloud need to store data for various purposes, but the access patterns by the application to the stored data may vary greatly from application to application; therefore, an optimized system for one application may not be the optimum system for another application. In other words, the workload on the data which may be organized as a dataset by the applications may differ greatly. Dataset as used herein refers to a collection of related or similar data that is used by one or more applications. For example, a collection of movie files for a video streaming service, or collection of files containing tables of a database. Further, the expected performance of the dataset on the storage backend may also vary greatly. For example, video streaming applications read sequentially large video files while a database engine handling online shopping carts for an online store needs to access block information in files that may be situated at different places in the database files, i.e., does not read sequentially large files.
Another characteristic of storage that makes optimization difficult is the expected lifespan of the data stored in the cloud computing environment. The lifespan may vary from application to application. For example, pictures or other media stored in the cloud for archiving purposes are expected to be available for years but are seldom accessed. In another example, an online shopping cart may need multiple accesses during an online transaction, but the online shopping cart may be deleted once the transaction is completed, cancelled or expired. Therefore, the lifespan of the data or datasets stored in the cloud computing environment may vary greatly.
Some existing cloud computing environments address the variability in dataset storage access, lifespan and workload generated by an application by providing specific storage solutions designed to work well with a specific type of dataset and workload. One example of an existing storage solution is a Hadoop Distributed File System (HDFS). The HDFS provides good performance for supporting big data collections and analytics, but performance suffers for high performance random read/write access that is required for supporting real-time application databases used by virtual network functions (VNF).
In other words, depending on the dataset and the associated workload on the dataset by the application, an operator or cloud provider has to select among different existing storage solutions that may provide good performance for certain aspects of the workloads, e.g., supporting big data collections, while possibly providing weak performance for other aspects of the workload, e.g., poor high performance random read/write access. In order to address this problem in existing storage solutions, some vendors deploy storage systems that will cover a wide scope of workload and applications. However, even this “wide scope” storage solutions fail to cover all possible scenarios. For example, this type of storage solution often uses rudimentary workload characterizations based on data access patterns, e.g., least recently used (LRU), etc., for a given set of storage resources to perform tiering between media types within a single storage backend, thereby leading to accuracy and configurability issues.
Moreover, once the dataset has been deployed on a given existing storage solution, it is difficult to move the dataset to another storage solution if the workload pattern changes drastically, e.g., the dataset becomes dormant because the application is retired but the data must be maintained for regulator reasons. Moving the dataset in this situation requires costly manual intervention from the operator to copy data from the current active system to a more passive system. Even if automatic migration is possible, the automatic migration fails to account for workload variations on the system and other cloud computing tenant's needs, thereby negatively impacting overall performance of the storage backend. In current cloud computing environments, the application and not the existing storage solution is responsible for migrating data from high performance to low performance backend tiers when data is less frequently accessed.
Further, other existing cloud computing environments fail to adequately address the variability in dataset access, lifespan and workload generated by an application. For example, these existing cloud computing environments offer a single generic storage solution or a limited choice with different service level agreements (SLAs) at different cost, e.g., use of solid state drive (SSD) or hard disk drive (HDD) for hosting the dataset. While the single generic storage solution or the limited choice of storage solutions may provide an acceptable compromise for most applications, these storage solutions are insufficient for applications requiring more specific performance in other dimension of the storage solution, e.g., high performance read/write access. Therefore, application providers typically will select a specific cloud provider having an offered storage solution that best fits their expected needs. However, such a storage solution will still fall short of being the optimum storage solution.
Further, there is a lack of integration between various backends in the cloud computing environment, the result being the inability to provide seamless cloud storage services such as automatic data migration. Also, once the application is deployed in the cloud computing environment, there are limited tunable parameters that can be adjusted to adapt the performance of the existing storage solution to the possibly changing behavior of the application over time.