Computer applications with database connectivity oftentimes have been designed such that backend components (such as, for example, tables, stored procedures, views, and/or the like), and also the database itself, need to be configured before the start of the applications that use them. For instance, relational databases have been used to provide information to computer applications, and the components of the relational databases typically are designed and created before such applications try to access them. As a result, information about the contents of the databases is needed before a database server can be started. Although this approach can work well in small, single-tenant environments, it can becomes more difficult as complexities are introduced, e.g., by virtue of the size of the application growing and/or its functionality increasing, the number of tenants using the application rising, etc.
One particularly problematic area involves highly distributed and scalable environments, e.g., of the sort oftentimes associated with cloud computing scenarios. In the computer systems architecture world, cloud computing has recently received some attention. Although there are many competing definitions for “cloud computing,” it is fairly well accepted that cloud computing generally involves (1) the delivery of computing as a service rather than a product, and (2) providing shared processing and/or storage resources, software, information, etc., to computers and other devices as an oftentimes metered service over a network (typically the Internet). In a cloud computing environment, end users do not necessarily need to know the physical location and configuration of the system that delivers the services. Applications typically are delivered to end-users as the service, enabling transparent access to the cloud-based resources.
Regardless of whether a cloud computing environment is implemented, it is not uncommon for every tenant in a multi-tenant environment to have its own repositories. These repositories may be used in providing backend database functionality and/or the like. Moreover, the number and type(s) of repositories for a given tenant can vary depending on the applications actually used by that tenant, and if users associated with a tenant want to use a specific function of an application for the first time, an initial process of creating the repositories required for that functionality may be triggered. For instance, an implicated database management system (DMS) may create several repositories for the tenant including, for example, one repository for permanent storage, and perhaps three additional repositories associated with other technical requirements (e.g., for storing metadata, providing a search index, enabling caching functionality, etc.). Further additional repositories for the tenant may need to be created on demand, e.g., when needed by a different part of the application (e.g., an application instance) or another service that uses the application for storing data. Of course, as alluded to above, it will be appreciated that this description could in some implementations apply to each tenant in a multi-tenant system, thereby requiring a large amount of processing during runtime by virtue of the need to create many different repositories for many different tenants, applications, and/or underlying purposes.
In this regard, in a public cloud installation, e.g., where tenants can be created and deleted at virtually any time (for instance, when a customer buys access to a software solution or cancels an existing contract), it would be desirable to react to such events as they occur—even if they occurring during runtime of the associated underlying application. But when a new tenant is created, the backend may not yet be prepared for the tenant and/or the application may not have the information necessary to connect to the repositories of the newly created tenant, e.g., if conventional approaches are leveraged. Thus, it will be appreciated that the dynamic scaling of the database (or more generally, the data storage service) and the use of the same database instance for several tenants and different datasets can be very difficult, particularly if a predefined schema of table structure must be maintained.
In view of the foregoing, it will be appreciated by those skilled in the art that highly distributed and scalable environments, like those associated with cloud computing, can oftentimes present problems because information regarding what is needed for creating and/or maintaining a database (or more generally, a repository) for storing data from specific tenants, applications, and/or the like, generally is not known and in fact is sometimes not knowable, at least at startup time. For instance, it is not always possible to predict when new users may purchase an application, when current users might terminate their use of an application, when users might change tenants (e.g., by virtue of a merger, acquisition, reorganization, etc.), and so on. Similarly, there may be problems associated with changing the schema for differently structured data, creating new tables, starting a new database server, etc. These problems may be manifested in scenarios where, for example, it would be desirable to use the same database for newly created tenants who want to access their databases or repositories from a specific part of an application. Additionally, downtime generally is seen as unacceptable in a public cloud scenario—and it therefore may not be feasible to repeatedly stop and restart an application, e.g., as new components are configured and made available, as new tenants are introduced, etc.
In conventional, single-tenancy architectures, applications and/or application instances generally will access databases or other repositories with predefined structures. Index mappings (e.g., used for searching) will be linked to the repositories. The applications and/or application instances, in turn, will be configured to access exactly these repositories. All these components generally will be “hardwired” in the system and configured before the various components and/or associated services are started. As a result, conventional, single-tenancy architectures generally will implement static, predefined installations and deployment processes. Indeed, as alluded to above, the system oftentimes will need to create repositories that in turn have to be configured with all tables, stored procedures, views, etc., as well as a “hardwired” index mapping for an associated search engine. After that, it becomes possible to configure the applications and/or application instances and start everything in the correct order.
Pitfalls in conventional, single-tenancy architectures can come into play to an even greater extent when scalability and multi-tenancy issues are considered, when it oftentimes is not possible to provide a unique and complete installation for each tenant, and/or in other situations. Further, conventional, single-tenancy installations generally involve static and disjunct subsystems that are not scalable and/or have difficulties when attempting to scale. Unique installations generally will have no connection to one other, thereby implying that the components used in a first environment cannot be used in another installed environment, e.g., in order to handle failures of services of another installation. Yet the ability to provide failover mechanisms oftentimes is an important feature in cloud computing scenarios, and the limited ability to provide them can be problematic. Another problem involves each installation only being used for one application at a time. For instance, if two different applications and/or application instances want to use an application for storing data, it may be necessary to provide one installation for each application or application instance.
FIG. 1 is an example single-tenancy architecture. As shown in FIG. 1, for a first installation (Installation A), an application instance (e.g., a web application instance) 100a attempts to access a predefined repository or database 104a. The information needed for accessing this database 104a was configured and stored as application settings before the application instance 100a was started. At the backend side 102a in Installation A, everything is properly configured for a specific tenant and a specific application instance (in this case, application instance 100a). This predefined configuration is “hardwired” in the system and cannot be dynamically changed for another tenant, application, and/or application instance. Thus, it will be appreciated that each application/tenant (or application instance/tenant) combination is provided with its own installation settings to connect to the backend and, in the FIG. 1 example, separate Installations A and B are provided. In this regard, a parallel structure to that described in connection with Installation A is provided for Installation B, with like reference numerals having the “a” and “b” suffixes being provided for, and designating like components in, Installations A and B respectively.
Referring once again to Installation A, the database 104a is configured with a static repository. Thus, it can only be used for one application instance (in this case, application instance 100a) and one tenant. If another application instance (e.g., application instance 100b) and/or tenant would like to use the web application, it would need another complete installation (e.g., as in Installation B) that is unique to that combination.
To be able to search documents and/or objects stored in the repository 104a, an index mapping 106a that is configured to match the repository 104a is provided and configured for a specific application/tenant (or application instance/tenant) combination. The naming convention for the search index 108a is static and non-unique, and it only works with one repository 104a and provides one searchable index 108a. As shown in FIG. 1, there is only one search index per installation. The index 108a is updated every time a new document is stored to the permanent repository 104a, e.g., by reading a “changes feed” using a search plug-in for metadata and an indexing tool for the full-text search.
There currently are three main approaches to enabling multi-tenancy and to solving the need for preconfigured databases (e.g., in arrangements where relational databases and/or the like are used). These main approaches involved shared nothing, shared database, and shared table architectures. Unfortunately, these approaches do not fit the needs of complex, multi-tenant environments with heavy user bases because they tend to involve manual steps and tend to not scale well. Each of these approaches will be described in greater detail below.
First, in a shared nothing approach, one database is used for each tenant. Using one database for each tenant implies a need for some external scripting, e.g., if one tries to enable new databases for newly created tenants. The control of the data source, however, is outside the application itself and cannot dynamically scale because there is no knowledge of the demands and/or resources available. In a related vein, load balancing may not be possible, e.g., because the application cannot assign a heavy load tenant to a database instance that is capable of handling more traffic. Thus, every tenant is provided with exactly the same database in this scenario, regardless of whether there a few users or thousands of users using a particular service. Although this approach could be improved by setting up database clusters, such an approach leads to more complexity and even further reduces the flexibility to react dynamically to changes of the data structure, database usage of the application, etc.
Second, in a shared database approach, one database is shared and different schema namespaces are used. This approach isolates the data from the tenants, but one still needs to create everything “from scratch” and outside the application itself. In most cases, this unfortunately leads to downtime, the reliance on and heavy utilization of external scripts, a database administrator taking these and/or other actions manually, etc. Indeed, the conventional computer science wisdom is that it is bad practice to undertake these steps automatically from within the application, e.g. by using a Hibernate connector or the like. As a result, this approach oftentimes brings with it heavy reliance on difficult to maintain techniques like reflection (e.g., using a tool like Reflection available from Attachmate), the use of abstraction frameworks like Hibernate (available from Red Hat), and/or the like, even though such techniques are not designed to be used in these ways.
In addition, in a shared database approach, it oftentimes is difficult to use a full-text index for every tenant, separated from the index of every other tenant, because the database oftentimes uses only one index per database. To achieve the desired functionality, one may need to further adapt the code, e.g., to enable full-text searches over specific datasets for the different tenants, while potentially staying aware of how each index is shared with other tenants (e.g., in order to avoid potentially “leaking” information across customers, etc.). There also exists the same or similar problems regarding the use of predefined schemas, the inability to react dynamically (including the inability to be downwardly compatible to changes), etc., without changing the whole data layer and/or database schema.
Third, an approach using shared tables (e.g., where different prefixes are used to distinguish between data of different tenants) might be an easy-to-implement approach, but could also be quite dangerous. When using a shared table approach, the data of all tenants is stored in the same table. Different prefixes may, for example, be used to distinguish between data of different tenants. Unfortunately, however, this approach may be problematic in terms of security because access rights in database systems usually can only be specified on a table level and not, for example, based on table rows. Data isolation therefore may possibly occur at the application level and, as a result, it could easily be the case that a tenant sees the data of another tenant. Additionally, this approach could make it difficult to allow tenant specific extensions to the database schema, as doing so likely would affect all tenants. Resource contention could also be a problem. Moreover, the pitfalls of the shared nothing and shared database approaches described above are likely to exist here, as well.
Other approaches have been tried. For example, in a paper entitled “ProRea-Live Database Migration for Multi-tenant RDBMS with Snapshot Isolation,” the authors describe an approach for the migration of multi-tenant databases, combining proactive and reactive database migration approaches. They review commonly used multi-tenancy models and compare them concerning database migration concepts. Yet this paper does not describe a bootstrapping process, nor does it introduce a new multi-tenancy model.
As another example, in a paper entitled “ElasTraS: An Elastic, Scalable, and Self-Managing Transactional Database for the Cloud,” the authors describe a distributed transaction system for multi-tenant environments using a shared database process and a process for migration of running transactions across database instances. It does not, however, cover the process of repository bootstrapping or configuration.
U.S. Pat. No. 8,122,055 (which is hereby incorporated by reference herein in its entirety) describes a mechanism for coordination of a multi-tenant environment. A shared database is used to store information about configuration and location of unshared tenant databases as well as cross tenant data. This patent in essence provides an extension of the well-known “shared nothing” multi-tenancy model summarized above.
By using a user specific launch configuration, U.S. Pat. No. 8,560,699 (which is hereby incorporated by reference herein in its entirety) describes a technique able to start new instances of a service (e.g., a database service) that is customized to fit the needs of a particular user. In its process, a launch configuration may be provided by the user or may be generated automatically by the system. This is a provisioning-centric approach covering new instances. It does not describe a way of bootstrapping new repositories without user interaction on an already running database instance, however.
It therefore will be appreciated that it would be desirable to solve one or more of the above-described and/or other problems. For example, it will be appreciated that it would be desirable to provide an approach for dynamically setting up and configuring data repository connectivity in a multi-tenant web application, on demand and at runtime.
An aspect of certain example embodiments relates to techniques for dynamically setting up and configuring data repository connectivity in a multi-tenant web application, on demand and at runtime, e.g., in a cloud computing and/or other highly distributed and scalable environment.
Another aspect of certain example embodiments relates to setting up and configuring data repositories while avoiding manual and/or external scripting approaches, and potentially in the absence of detailed knowledge about the requests of the client applications.
Another aspect of certain example embodiments relates to looking up the location of an already existing repository or, in case one does not exist, creating a new repository including all needed technical enhancements (such as, for example, search indexes, etc.), in order to provide a scalable, flexible, and fault tolerant arrangement suitable for use in a highly distributed and scalable environment such as what might be present in connection with a public and/or private cloud environment.
Certain example embodiments thus relate to techniques for dynamically bootstrapping repositories or databases for newly created tenants at runtime in scalable, distributed multi-tenant environments. In certain example embodiments, the bootstrapping is triggered dynamically the first time a client application tries to access a specific repository related to a newly created tenant at runtime, leading to a flexible approach for enabling tenant- and application-specific repositories with optional search index mapping.
The term “bootstrapping” is used herein and, as will be appreciated by those skilled in the art, it oftentimes is used generally to refer to the starting of a self-sustaining process that is supposed to proceed without external input. In this context, however, those skilled in the art will further understand that bootstrapping may relate to the basic configuration of repositories, indexes, tables, views, and/or other components, that might be used in a multi-tenant, potentially highly-distributed and scalable environment (such as a cloud computing environment), e.g., where an application (e.g., a web application) accesses a database or other repository.
Certain example embodiments create structured repositories (e.g., databases) for new tenants dynamically at runtime, with the repository being modeled in means other than the tabular relations used in relational databases such as a “Not only SQL” or NoSQL database. The creation of the repositories may be carried out by a web application or the like and, apart from or in addition to the creation of repositories, certain example embodiments may also involve the dynamic creation of a search index linked to the newly created repositories. Certain example embodiments may undertake like actions in connection with the deletion of a tenant (e.g., when a customer's contract with an application provider ends, etc.).
In certain example embodiments, there is provided a method of managing a distributed, multi-tenant computing system comprising at least one processor and non-transitory storage media hosting a plurality of repositories designated for different respective computing system application-tenant combinations. A request for data to be obtained using a computing system application is received from a client application running on a client device, with the request being associated with a requesting computing system application-tenant combination that is based on a requesting tenant associated with the client application and the computing system application to be used in obtaining the data. A determination is made, using the computing system, as to whether the non-transitory storage media already stores a repository designated for the requesting computing system application-tenant combination. In response to a determination that there already is a repository designated for the requesting computing system application-tenant combination, the request for data is handled using the already existing repository designated for the requesting computing system application-tenant combination. In response to a determination that there is no existing repository designated for the requesting computing system application-tenant combination: a new repository designated for the requesting computing system application-tenant combination is dynamically and automatically created at runtime, without having to restart the computing system; the new repository is dynamically and automatically configured at runtime, without having to restart the computing system; and the request for data is handled using the new repository following said dynamic and automatic configuring.
In certain example embodiments, a distributed, multi-tenant computing system is provided. Processing resources include at least one processor, and are configured to enable a plurality of computing system applications to be performed. Non-transitory storage media hosts a plurality of repositories designated for different respective computing system application-tenant combinations. Wherein the computing system is configured to at least: receive, from a client application running on a client device, a request for data to be obtained using at least one of said computing system applications, the request being associated with a requesting computing system application-tenant combination corresponding to a requesting tenant associated with the client application and the computing system application(s) to be used in obtaining the data; determine whether the non-transitory storage media already stores a repository designated for the requesting computing system application-tenant combination; in response to a determination that there already is a repository designated for the requesting computing system application-tenant combination, handle the request for data using the already existing repository designated for the requesting computing system application-tenant combination; and in response to a determination that there is no existing repository designated for the requesting computing system application-tenant combination (a) dynamically and automatically create and configure, at runtime, a new repository designated for the requesting computing system application-tenant combination, without having to restart the computing system and without having to restart the computing system application(s) to be used in obtaining the data, and (b) handle the request for data using the new repository following said dynamic and automatic configuring.
In certain example embodiments, a distributed, multi-tenant computing system is provided and comprises processing resources including at least one processor. Tenant installations are backed by virtual and/or physical machines and are designated for respective tenants, with each said tenant installation supporting at least one application and at least one repository accessible by and/or to the respective tenant, and with each said repository being designated for a different application-tenant combination. A web application or service is configured to receive a request for a document from a client application. The processing resources cooperate to provide to the web application or service a response to the request for the document from the client application such that: when a determination is made that there is a repository already in existence for the specific combination of the tenant and the application involved in the request, that repository is used in responding to the request; and when a determination is made that there is not a repository already in existence for the specific combination of the tenant and the application involved in the request, a new repository is generated in cooperation with the processing resources dynamically and at computing system runtime, bootstrapping is performed for configuring the new repository dynamically and at computing system runtime, and that new repository is used in responding to the request.
Non-transitory computer readable storage mediums tangibly storing instructions for performing the above-summarized and/or other methods also are provided by certain example embodiments, as well as corresponding computer programs.
These features, aspects, advantages, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.