Multi-tenancy technology refers to architectures that allow a single instance of software to run on a server of a service provider, and the single instance provides services to a plurality of client organizations (i.e., tenants), such as a large number of small and medium-sized enterprises. The multi-tenancy technology differs from the traditional service providing technique in which multiple software instances or hardware systems are created on a server for different client organizations. In multi-tenancy technology, a software application is designed to virtually partition its data and configurations so as to enable each client organization to operate by using a customized virtual application instance. The multi-tenancy technology is attracting more and more attention since it can realize a huge economy of scale, reduce the cost of software usage of client organizations and increase the profits of a service provider.
In a multi-tenancy scenario, a single software application instance may support millions of tenants, and the number of tenants may vary at any moment. Therefore, in order to realize an economy of scale in a multi-tenancy scenario, the underlying database must adopt a scale-out method with some clustering technique.
Database partitioning is a commonly used database scaling out technique, which has been realized by database management systems like DB2, SQL Server, etc. It supports clustering a plurality of physical machines/partitions and provides a single database management view to an application. FIG. 1 schematically illustrates an architecture of database partitioning. As shown, a database is partitioned into a plurality of database partitions, and said plurality of database partitions can be located on different machines. Data in the database is actually stored in each of the database partitions, and the access to the database by an application will be routed to corresponding database partitions by the partitioning database system. Such an architecture is easily scaled out by adding new database partitions.
The database partitioning technique usually distributes different records in a database table to different database partitions according to the values of one or more fields in the database table. For example, information about the clients whose postcodes are less than 50000 is stored in a table of one partition, while information about the clients whose post codes are more than or equal to 50000 is stored in a table of another partition, and a view generated by the union of the two tables can provide information of all the clients to the application. The one or more fields for distributing records to different database partitions are referred to as partition keys.
In order to distribute data to different database partitions based on values of the partition key, such as time, region, post code, etc., as evenly as possible, the database partitioning technique usually adopts a method of hash partitioning, in which, hash values obtained from hashing the partition keys by a certain hash function will decide to which partitions the records belong. FIG. 2 shows an exemplary implementation of the hash partitioning method used in the database partitioning technique. As shown, a hashing function performs hash operations on a partition key to obtain a hash value within the range of, for example, 0-4095. A partition mapping table contains the corresponding relationship between each hash value and the corresponding partition, for example, one of the partitions 1-4. In this way, the records to which each partition key value belongs will be allocated to the corresponding partition through the hash operation and the partition mapping table. When a new partition is added, new mapping relationships between the partition key values and the respective partitions can be formed automatically by adding the partition number of the new partition to the partition mapping table.
When it is attempted to apply the database partitioning technique in a multi-tenancy scenario so as to realize the scaling out of the database, since accesses to data are usually limited to a tenant in a multi-tenancy scenario, and cross-tenant data access is uncommon, the partitions should be assigned according to tenants, that is, data of the same tenant is only stored in one partition, though the same partition can be used to store the data of a plurality of tenants. Since different tenants are distinguished according to the tenant IDs in a multi-tenancy scenario, a natural practice is to use the tenant IDs as the partition key. In this way, the corresponding partitions for storing the data of different tenants can be determined conveniently by hashing the different tenant IDs and using the partition table.
However, the problem of “availability” may arise from such a practice. That is, when a new machine/partition is added to the current database cluster, the current corresponding relationships between hash values and partition numbers in the partition table may change automatically due to the addition of the new partition number. Therefore, the partitioned database system needs to re-distribute the data of the current tenants, which requires a very long down-time. FIG. 3 illustrates that the data of the current tenants needs to be re-distributed when a new partition is added in the case where tenant IDs are used directly as the partition key. As shown, in the current technique using tenant IDs as the partition key, the system will use the hash function to convert the value of a tenant ID into one of the 0-4095 hash values. The hash value will correspond to a partition number through the partition mapping table. For example, when the partitioned database system has 2 partitions, the corresponding relationships between the hash values and the partition number are as shown in the upper table of FIG. 3, wherein, when the hash value to which a tenant ID corresponds is 2, its corresponding partition number is 0. When the system is newly added with two partitions, the corresponding relationships between the hash values and the partition numbers of the system are automatically modified as shown in the lower table of FIG. 3. In this way, when a hash value to which a tenant ID corresponds is 2, the corresponding partition number will become 2. That is to say, since two partitions are newly added, the data corresponding to this tenant ID needs to be migrated from the partition 0 to the partition 2, and other tenants will encounter the similar problem and need to be migrated.
During the down-time for migrating the tenant data, all the tenants are unable to access their data. Therefore, the unavailable time of each tenant equals to the down-time of the partitioned database system, which may be several hours or even tens of hours, and will increase together with the increase of the number of the tenants or the amount of data records. FIG. 4 shows the situation where the system down-time increases together with the increase of the number of the tenants when the tenant IDs are used as the partition key directly. Such a situation is unacceptable.
Obviously, a solution for applying the database partitioning technique in a multi-tenancy scenario is needed in the art, which solution can make use of the current database partitioning technique to realize the distribution of the data of respective tenants into different partitions, so as to facilitate the scaling out of partitions, and at the same time will not bring about the availability problem when scaling out partitions.