In the computer systems architecture world, cloud computing has recently received some attention. Although there are many competing definitions for “cloud computing,” it is fairly well accepted that cloud computing generally involves (1) the delivery of computing as a service rather than a product, and (2) providing shared processing and/or storage resources, software, information, etc., to computers and other devices as an oftentimes metered service over a network (typically the Internet). In a cloud computing environment, end users do not necessarily need to know the physical location and configuration of the system that delivers the services. Applications typically are delivered to end-users as the service, enabling transparent access to the cloud-based resources.
In a scalable, distributed multi-tenant scenario such as a cloud computing environment, information about where components such as services, backend databases supporting a service infrastructure, and the like, oftentimes are stored via a registration service or the like. A web application or the like can request a particular resource via a registry service and thus does not need to know where the components themselves are located. In this regard, FIG. 1 is an example block diagram showing multiple tenants using a web application 102 to access components in a backend datacenter 104 via a registry service 106 in an illustrative distributed cloud environment.
As will be appreciated from FIG. 1, each tenant (in this example, tenants 1-4) has a corresponding assignment 108a-108d in this registry 106, and every application trying to retrieve data from a specific tenant in the backend, e.g., a database server, can ask the registry service 106 where exactly the database instance is located. In this FIG. 1 example, the datacenter 104 includes a first virtual machine 110 that supports database nodes 112a-112n which, in turn, each has one or more repositories. The assignments indicate which of the database nodes 112a-112n serve as master and replication instances for the tenants. With respect to the master and replication instance assignments, it can be seen that the tenants assigned in a given node's master and replication instances do not overlap, e.g., so that there is some fault tolerance.
The FIG. 1 example also demonstrates how a request to the registry service to the datacenter instance may be processed. In that regard, when a tenant using the web application 102 makes a request to the registry service 106, the registry service 106 checks the corresponding assignments 108a-108d and attempts to connect to the appropriate database node 112a-112n that is the master instance for the specific tenant making the request. It is noted that the FIG. 1 example is simplified in that there may be multiple datacenters, each hosting one or more physical and/or virtual machines that support the callable services and their supporting backend components. There also may be more than four tenants in some implementations. The registry service thus may be quite helpful, and indeed required, in order to help keep track of everything in such a complicated system.
The FIG. 1 example configuration is different than a conventional computing environment, where each application is precisely connected and configured to use a static bundled backend. Such an arrangement is shown in FIG. 2. As will be appreciated from FIG. 2, there is one application 202 connected to one database 204 in a first server 206. If there is a system crash (e.g., if a connection cannot be made between the application 202 and the backed database 204, as shown in the bottom portion of FIG. 2), it is easy to restart the environment and make things functional again, as there is only one data source and its details (e.g., where it is located, how it is configured, how it is instantiated, etc.) are known.
Unfortunately, however, the ability to restore instances and recover from system crashes, failures, or the like, becomes increasing complicated as the complexity of a scalable, distributed multi-tenant environment grows—and as more complicated techniques for processing requests are introduced. For instance, a “rolling failover” procedure may be implemented in connection with the FIG. 1 example and, thus, more than two instances can be used as a master/replication pair in the lifetime of the tenant. To keep track of everything, each assignment 108a-108d in the registry service 106 may include the information needed for executing a request to the backend datacenter 104. This arrangement may help reduce the likelihood of the web application 102 being unable to access the needed backend services from the datacenter 104.
Yet if the registration information itself is lost, e.g., as a result of a system crash, failure, or the like, it could be very difficult to recover the still-available data (e.g., because the locations of the backend components, the mappings as between master/replication pairs, etc., may not be known). Similar difficulties could be presented if, for example, the registry information becomes inconsistent as a result of misuse from the application side, a faulty entry by an administrator or other party, a system crash, etc. These situations need to be handled, but all information may be lost and there might not be an easy-to-implement recovery process for recovering the entries (e.g., compared to the FIG. 2 scenario, where the data recovery is trivial because of the known, static configuration details). In the end, this could mean total data loss.
One way of attempting to reduce the likelihood of such problems in a scalable, multi-tenant distributed environment involves simple backup of the registry entries. But this approach may not be of much help in a dynamic system, if inconsistencies are encountered, etc. Indeed, new tenants can be created at a high frequency and may leave at a high frequency. The same may be true with failover events, which can lead to the reassignment of tenants to replication instances. As a result, a real-time or substantially real-time back of the registry might be needed, even though it can be hard to implement such an approach. And even in that case, a real-time or substantially real-time back of the registry might not be able to handle data loss resulting from inconsistencies.
As alluded to above, in a conventional single tenant, static and non-distributed environment with a database and an application using this database, it is trivial to recover from a crash, because the areas that can be looked to for data after a system crash are known. In addition, the application is still configured so that it can access the database backend again. The issues of how to recover from the problems noted above do not arise in such non-distributed environments, where every application is configured exactly so that it always knows where to look for the database. But even in a simplistic distributed scenario with something like the registry service discussed above, if the registry entries are lost after a system crash, there may not be a reliable way of retrieving the information and returning the environment to a state in which the data of a specific tenant is restored and in which it is known whether the restored data is the most up-to-date data.
Although other somewhat related attempts have been made, they unfortunately do not help in resolving the problems discussed above. For instance, U.S. Publication No. 2013/0276070 (which is hereby incorporated herein by reference) describes the authentication of a user or, more generally, the search for credentials in a distributed environment at different instances hosted at special geo-locations. Different datacenters at distributed locations may be used, and users can be replicated to different instances. Although the '070 publication uses a kind of registry service like a master user table, there is only a load balancer that decides to which geo-location a request to an authentication instance will be sent. This request will be rerouted until an instance is found. Thus, in this approach, even in a failure or system crash, there is no need to recover or restore important registry information.
In the scenario covered in “Dynamic Database Replica Provisioning through Virtualization” (written by Sergey Savinov and Khuzaima Daudjee), the authors only use one master instance and several replication instances for a specific user without considering a second dimension that covers multi-tenant-aware backends. The replication is triggered by a batch job using transaction logs, which is referred to therein as “refreshing.” A load balancer alone selects the destination server that is, according to the description, configured for a specific backend. In this case, there is no need for a registry service. Thus, even if this arrangement is implemented in connection with a virtual machine and in a dynamic manner, it represents a more traditional installation that can handle traffic peaks and does not, for example, address data inconsistency issues.
Furthermore, it will be appreciated that this Savinov and Daudjee paper does not involve the recreation of a registry service, e.g., because the approach described therein does not use any such registration service at all and instead relies on a load balancer for traffic shaping. In a failure scenario, this approach can only use the replica containing the present data according to the transaction logs, and implementations therefore may encounter data losses because not all data has been replicated if the batch job was not triggered after the latest write operations to the master instance. This in turn implies that no real time switching is possible, because the replications may not all be up-to-date, even with respect to the instance that was refreshed last. This Savinov and Daudjee paper thus does not account for the added dimension of a multi-tenant-aware infrastructure with registry service at all, as the authors only discuss how to switch to a replication instance and how to rebuild the most up-to-date data (but likely not real-time replication), performed via batch jobs reading the transaction logs.
U.S. Pat. No. 8,429,134 (which is hereby incorporated herein by reference) describes a way to recover a distributed database in the case of faulty cache flushes after a failure of a database node. The '134 patent uses a buffer cache per instance for performance improvements. This cache contains so-called blocks of data. If a block has to be modified, it said to be quicker to save the changes to a redo-log and flush them after a period (e.g., at so-called checkpoints). If one or more database nodes fail a single surviving instance, the recovery instance will take care of the recovery and will read the redo-logs of each crashed instance. All log entries dated after a certain checkpoint will be written to the database to be able to restore all of the data written only to the redo-logs. In this case, there is no central registry that must be recovered. The database itself handles the recovery of cache entries that were not flushed. However, this approach does not take care of lost registry entries or inconsistencies in a central registry service.
U.S. Publication No. US 2012/0259894 (which is hereby incorporated herein by reference) generally discusses the problems that one faces by replicating database systems but is scant on details concerning recovery procedures, the recreation of registry services, etc.
It therefore will be appreciated that it would be desirable to solve one or more of the above-described and/or other problems. For example, it will be appreciated that it would be desirable to provide systems and/or methods for data recovery in distributed, scalable multi-tenant environments to handle problems that arise when a registry service itself goes down, when inconsistent data entries arise, and/or the like.
An aspect of certain example embodiments relates to techniques for recovering registry information and recreating the entire registry for all available tenants in a scalable, multi-tenant distributed environment, while also potentially looking into all of the running services where the most up-to-date data could exist.
Another aspect of certain example embodiments relates to the dynamic real-time or substantially real-time recreation of connections between a web application and the latest instance of a tenant in a multi-tenant environment hosted in a highly distributed multi datacenter environment (such as, for example, a cloud computing environment) following a failure of one of the databases holding the tenant data, a corruption of the registry entry pointing to the tenant, and/or the like.
Another aspect of certain example embodiments relates to an automatic recovery solution that compares the timestamps of the last written entities in order to examine the most current data and “re-bundle” the last master/replication instances, e.g., for restores in an environment where a rolling failover procedure is implemented.
In certain example embodiments, there is provided a method of recovering from a fault in a multi-tenant distributed environment comprising processing resources including at least one processor and in which a registry stores information indicating which of a plurality of nodes in the multi-tenant distributed environment are assigned to host master and replication instances storing data for the respective tenants. The method comprises, in response to a detected fault: obtaining a list of running instances in the multi-tenant distributed environment; identifying from the list of running instances, for each said tenant, one or more candidate instances that might host master and/or replication instances for the respective tenant; and for each tenant for which exactly one candidate instance is identified, re-registering with the registry this identified candidate instance as the master instance for the respective tenant. In addition, for each tenant for which exactly two candidate instances are identified: a determination is made as to whether timestamps of the last changes for each of these candidate instance are available; and in response to a determination that timestamps of the last changes for each these candidate instance are available, the two candidate instances are re-registered with the registry as master and replication instances for the respective tenant based at least in part on the timestamps, if possible.
In addition to the features of the previous paragraph, in certain example embodiments, for each tenant for which exactly one candidate instance is identified: a determination may be made as to whether there is a free instance on a node that does not host this candidate instance; and in response to a determination that there is not a free instance on a node that does not host this candidate instance, this identified candidate instance may be re-registered with the registry as the master instance for the respective tenant and a replication instance for the respective tenant is not re-registered. Furthermore, the method may include in response to a determination that there is a free instance on a node that does not host this candidate instance: re-registering with the registry this identified candidate instance as the master instance for the respective tenant, replicating this identified candidate instance, and re-registering with the registry this replicated identified candidate instance as the replication instance for the respective tenant.
In addition to the features of either of the two previous paragraphs, in certain example embodiments, for each tenant for which exactly two candidate instances are identified and in response to a determination that timestamps of the last changes for each these candidate instance are available, the method may further include: determining whether the timestamps fall within a predefined latency tolerance; and in response to a determination that the timestamps fall within the predefined latency tolerance, re-registering with the registry the two candidate instances as master and replication instances for the respective tenant based at least in part on the timestamps and based at least in part on which instance has later written entities.
In addition to the features of the previous paragraph, in certain example embodiments, for each tenant for which exactly two candidate instances are identified, and in response to determinations that (a) the timestamps of the last changes for each these candidate instance are unavailable, and (b) the timestamps do not fall within the predefined latency tolerance: the two candidate instances may be re-registered with the registry as master and replication instances for the respective tenant based at least in part on information about views defined in the instances and based at least in part on which instance has later written entities, e.g., when it is possible to gather the information about the respective views of the instances.
In certain example embodiments, a method of operating a distributed multi-tenant environment is provided. Master/replication instance pairings indicating, for each said tenant in the distributed multi-tenant environment, which backend nodes serve as master and replication data instances for the respective tenant, are stored in a registry. Using processing resources including at least one processor, a web application is operated in response to a request from a client device, with the web application accessing an appropriate master instance in dependence on the tenant using the client device and based on a lookup using the registry. Time-related information is saved for each operation performed on each entity in each said instance. In response to a first fault type causing a master instance to become unavailable, the corresponding master/replication instance pairing is updated in the registry such that the associated replication instance becomes the new master instance in that pairing and such that a free instance becomes the new replication instance in that pairing, e.g., using processing resources. In response to a second fault type causing the registry to become unavailable: at least one candidate instance is identified for each said tenant to be used in recreating the registry and master/replication instance pairings in the recreated registry; and for each tenant for which two or more candidate instances are identified, time-related information is processed to recreate the registry and master/replication instance pairings in the recreated registry, e.g., using the processing resources.
In certain example embodiments, a distributed multi-tenant computing system is provided. The system includes processing resources including at least one processor. A non-transitory computer readable storage medium tangibly stores a registry including master/replication instance pairings indicating, for each said tenant in the distributed multi-tenant computing system, which backend nodes serve as master and replication data instances for the respective tenant. A web application is operable in connection with the processing resources and in response to a request from a client application running on a client device, with the web application being configured to access an appropriate master instance in dependence on the tenant using the client device and based on a lookup using the registry. The processing resources are configured to at least: save time-related information for each operation performed on each entity in each said instance; in response to a first fault type causing a master instance to become unavailable, update the corresponding master/replication instance pairing in the registry such that the associated replication instance becomes the new master instance in that pairing and such that a free instance becomes the new replication instance in that pairing; and in response to a second fault type causing the registry to become unavailable, identify at least one candidate instance for each said tenant to be used in recreating the registry and master/replication instance pairings in the recreated registry, and for each tenant for which two or more candidate instances are identified, process the time-related information to recreate the registry and master/replication instance pairings in the recreated registry.
In addition to the features of either of the two previous paragraphs, in certain example embodiments, for each tenant for which only one candidate instance is identified, the one identified candidate instance may be registered in the registry as the master instance in the master/replication instance pairing for the respective tenant; and, if possible, a free instance may be assigned as the replication instance for the respective tenant and the assigned free instance may be registered in the master/replication instance pairing.
In addition to the features of any of the three previous paragraphs, in certain example embodiments, the processing of time-related information may further comprise for each tenant for which two or more candidate instances are identified: limiting the number of identified candidate instances to two when more than two candidate instances are identified; and registering in the registry one identified candidate instance as the master instance and the other identified candidate instance as the replication instance in the master/replication instance pairing for the respective tenant, based at least in part on the time-related information for each of these identified candidate instances.
Certain example embodiments relate to a distributed multi-tenant computing system. Processing resources include at least one processor. A non-transitory computer readable storage medium tangibly stores a registry including master/replication instance pairings indicating, for each said tenant in the distributed multi-tenant computing system, which backend nodes serve as master and replication data instances for the respective tenant. A server-side application is operable in connection with the processing resources and in response to a request from a client application running on a client device, the server-side application being configured to access an appropriate master instance in dependence on the tenant using the client device and based on a lookup using the registry. The processing resources are configured to at least: automatically detect faults of different fault types; in response to a detected fault of a first fault type that causes a master instance to become unavailable, update the corresponding master/replication instance pairing in the registry in accordance with a rolling failover scheme. The processing resources are further configured, in response to a detected fault second fault type, different from the first fault type, that causes the registry to become unavailable to at least: identify at least one candidate instance for each said tenant to be used in recreating the registry and master/replication instance pairings therein; for each tenant for which only one candidate instance is identified, register the one identified candidate instance as the master instance in the master/replication instance pairing for the respective tenant in recreating the registry; and for each tenant for which two or more candidate instances are identified, process corresponding aspects of each said identified candidate instance in order to (a) select, from the two or more identified candidate instances, a candidate master instance and a candidate replication instance, and (b) register the selected candidate master instance and the selected candidate replication instance as master/replication instance pairings in recreating registry.
In addition to the features of the previous paragraph, in certain example embodiments, the processing resources may be further configured, for each tenant for which only one candidate instance is identified, to at least assign a free instance as the replication instance for the respective tenant and register the assigned free instance in the respective master/replication instance pairing in recreating the registry, if possible.
In addition to the features of either of the two previous paragraphs, in certain example embodiments, for each tenant for which two or more candidate instances are identified, a plurality of corresponding aspects may be processed in a predetermined order until either (a) it becomes possible to select, from the two or more identified candidate instances, a candidate master instance and a candidate replication instance, or (b) all corresponding aspects have been processed. For instance, the corresponding aspects may include time-related information, size-related information, and/or the like.
In addition to the features of the previous paragraph, in certain example embodiments, once all corresponding aspects have been processed, candidate master and replication instances may be selected at random from the two or more identified candidate instances.
Non-transitory computer readable storage mediums tangibly storing instructions for performing the above-summarized and/or other approaches also are provided by certain example embodiments, as well as corresponding computer programs.
These features, aspects, advantages, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.