Cloud computing is rapidly changing the Internet into a collection of clouds, which provide a variety of computing resources, storage resources, and, in the future, a variety of resources that are currently unimagined.
Specifically, cloud computing is a technology infrastructure that facilitates: supplementing, consuming, and delivering Information Technology (IT) services. The cloud environment provides elastic provisioning of dynamically scalable virtual services.
A tenant is considered as a subscriber of some amount of storage in the cloud or an application who owns part of the shared storage environment. Multi-tenancy is an architecture where a single instance of software runs on a server, which is serving multiple tenants. In a multi-tenant environment, all tenants and their users consume the service from a same technology platform, sharing all components in the technology stack including the data model, servers, and database layers. Further, in a multi-tenant architecture, the data and configuration is virtually partitioned and each tenant works with a customized virtual application instance.
A multi-tenant storage controller hosts multiple storage tenants. A group of such storage controllers clubbed together provides the High Availability (HA) of storage services for tenants. Traditional high availability methods fail if a failure happens to both the controllers in the pair.
Moreover, in today's storage controllers, the HA of storage services is made possible by bringing up all the storage services in a monolithic fashion. The limitation of such controllers and corresponding architecture are that the storage services are treated as global and the state of a storage service in a storage controller is not replicated to the other storage controller on a periodic basis. So, when a failure occurs in the controller, the failure is always observed with respect to the controller and not from the perspective of the consumer (tenant) of the storage provided by the controller.
With such monolithic architectures, when a partial failure happens in the controller, though some of the storage services are affected, all the storage services are moved to a stand-by controller. This includes the services that are not affected by the failure.
For example, when a partial fault happens at a storage controller (SC1) and if service 1 (S1) is affected, the traditional approach is to move the control to storage controller 2 (SC2). In this process, all the unaffected services, including service 2 (S2) and service 3 (S3) are also moved to the SC2. This happens because of the monolithic architecture of conventional storage controllers. Another problem with traditional storage controllers is that if a fault occurs in both controllers (SC1 and SC2), storage services become completely unavailable.