This disclosure relates generally to systems, methods, and devices for feature implementation and workload capture in database systems. Databases are an organized collection of data that enable data to be easily accessed, manipulated, and updated. Databases serve as a method of storing, managing, and retrieving information in an efficient manner. Traditional database management requires companies to provision infrastructure and resources to manage the database in a data center. Management of a traditional database can be very costly and require oversight by multiple persons having a wide range of technical skill sets.
Traditional relational database management systems (RDMS) require extensive computing and storage resources and have limited scalability. Large sums of data may be stored across multiple computing devices and a server may manage the data such that it is accessible to customers with on-premises operations. For an entity that wishes to have an in-house database server, the entity must expend significant resources on a capital investment in hardware and infrastructure for the database, along with significant physical space for storing the database infrastructure. Further, the database may be highly susceptible to data loss during a power outage or other disaster situation. Such traditional database systems come up with significant drawbacks that may be alleviated by a cloud-based database system.
A cloud database system may be deployed and delivered through a cloud platform that allows organizations and end users to store, manage, and retrieve data from the cloud. Some cloud database systems include a traditional database architecture that is implemented through the installation of database software on top of a computing cloud. The database may be accessed through a Web browser or an application programming interface (API) for application and service integration. Some cloud database systems are operated by a vendor that directly manages backend processes of database installation, deployment, and resource assignment tasks on behalf of a client. The client may have multiple end users that access the database by way of a Web browser and/or API. Cloud databases may provide significant benefits to some clients by mitigating the risk of losing database data and allowing the data to be accessed by multiple users across multiple geographic regions.
There exist multiple architectures for traditional database systems and cloud databases systems. One example architecture is a shared-disk system. In the shared-disk system, all data is stored on a shared storage device that is accessible from all processing nodes in a data cluster. In this type of system, all data changes are written to the shared storage device to ensure that all processing nodes in the data cluster access a consistent version of the data. As the number of processing nodes increases in a shared-disk system, the shared storage device (and the communication links between the processing nodes and the shared storage device) becomes a bottleneck slowing data read and data write operation. This bottleneck is further aggravated with the addition of more processing nodes. Thus, existing shared-disk systems have limited scalability due to this bottleneck problem.
Another existing data storage and retrieval system is referred to as a “shared-nothing architecture.” In this architecture, data is distributed across multiple processing nodes such that each node stores a subset of the data in the entire database. When a new processing node is added or removed, the shared-nothing architecture must rearrange data across the multiple processing nodes. This rearrangement of data can be time-consuming and disruptive to data read and write operations executed during the data rearrangement. And, the affinity of data to a particular node can create “hot spots” on the data cluster for popular data. Further, since each processing node performs also the storage function, this architecture requires at least one processing node to store data. Thus, the shared-nothing architecture fails to store data if all processing nodes are removed. Additionally, management of data in a shared-nothing architecture is complex due to the distribution of data across many different processing nodes.
Particularly in a database as a service (DBaaS) implementation where database management is provided in a cloud-based environment, it may be desirable to continually release new features or programs to database clients. These new programs may improve functionality of the database, may provide new features to database clients, may provide increased security, may provide faster runtimes, and so forth. However, there are always risks associated with releasing new programs that have not been rigorously tested in a real-world environment. In the case of database technology, there are inherent risks associated with releasing a new program that has not been tested on actual database data or with actual client queries on the database data. The program may have errors or bugs that could cause damage to the database data, cause issues with performance or runtime, perpetuate additional errors throughout other functionality in the database system, and so forth.
However, there are numerous challenges associated with testing a fast-moving cloud database service. For example, a largescale cloud-based database service will run tens of millions of client queries per day and may have an online upgrade process to ensure continuous availability. Features or programs may be released continuously, and clients may expect a fast turnaround for new functionality. This fast turnaround may mean there is a short time window for testing features or programs before release. Disclosed herein are improved systems, methods, and devices for testing features or programs and for capturing client workloads in database systems. The disclosures herein enable substantial improvements to the program development cycle so that new features or programs may be released to clients only after undergoing rigorous real-world testing.