The present invention generally relates to database field, and in particular, to a database system and a method of optimizing cross-database query.
At present, in database field, if a query to be performed involves data tables with the same structure stored in a plurality of databases, two methods, for example, can be generally used. As one of the methods, query can be performed on each of the plurality of databases, and then the results of the query obtained from respective databases are aggregated together in application logic, followed by providing it to the query requester. However, this method involves complicated programming logics, and has low query performance.
As the other of the methods, a part of data in other databases of the plurality of databases can be synchronized periodically or in real-time in one database as a query object, and only this database will be used to response the cross-database queries. However, a problem of data redundancy may occur in this method, especially when a great deal of data needs to be synchronized, in which a target database is required to have considerable capacity. Additionally, it is hard to maintain data consistency in such a method as synchronizing data in a plurality of databases into one database. Furthermore, a complex synchronization mechanism or logic is required to implement synchronizing, and real-time support may be needed, all of which increase complexity of the system.
The above problems are especially serious in a SaaS environment. SaaS is the abbreviation of Software-as-a-Service, which is one mode of providing software through Internet. Using the mode, a user need not purchase a software product, but turns to rent the Web-based software from a service provider, to manage enterprise's business without maintaining the software. The user who rents software is called a “tenant”. The service provider will provide full management and maintenance for software, and also provide off-line operation and local data storage of software while providing internet application for the tenant, so that the tenant can use his rented software and service anywhere and at any time. For many small-size enterprises, Saas is the best way to use advanced techniques, which eliminates the need of purchasing, deploying and maintenance of infrastructure and applications for enterprises.
A large-size SaaS application may have lots of tenants and data, and generally uses a scaling out mechanism as the business increases. The so-called scaling out is to divide the data of the application, and to distribute data that should have been collectively stored onto different physical databases according to a certain rule.
FIG. 1 shows an exemplary SaaS database system using a scale out mechanism. In this SaaS database system, three underlying databases are used, and data is divided into the three underlying databases based on tenants. Hereinafter, a collection of data belonging to a tenant is referred to as “tenant data” of this tenant. In the SaaS database system shown in FIG. 1, tenant data of tenants T1, T2 and T5 are located in a database A, tenant data of tenants T3 and T4 are located in a database B, and tenant data of tenant T6 are located in a database C.
There are cross-tenant data access requirements in many typical SaaS applications at present. FIG. 2 shows several example cases requiring cross-tenant access. For example, in a case when both retailer and suppliers are tenants of the same SaaS application, the retailer may hope to see all or part of data of its suppliers when authorized by the suppliers. For example, a parent company may hope to see data of its subsidiary, and the subsidiary may hope to see data of its branches. In addition, a SaaS service provider such as an operator may need to access all tenants' data for some reasons so as to generate report or perform operations or the like.
In order to satisfy the above requirements, processing is generally performed as follows in the related art. For example, in the case of FIG. 1, it supposes that order tables SalesOrder as shown in FIG. 3 exist in the databases A and B, respectively. In the table SalesOrder shown in FIG. 3 as a simple example, each of the columns represents the following meanings, respectively. ID is identification for identifying a row, Tenant refers to the tenant to which data in the row belong, Name refers to name of goods involved in an order, Num refers to the number of goods involved in an order, and Price refers to price of goods involved in an order.
For example, if the tenant T1 wants to query data of all its orders as well as data of an order in which the number of goods involved in the order of tenant T3 in another database B is greater than 500, the tenant T1 operates on a Web-based software, and the Web-based software generates a SQL query statement as follows:
  SELECT * FROM SalesOrder WHERE Tenant= ‘T1’ OR(Tenant= ‘T3’ AND Num>500);
The SQL statement is transmitted to a request routing layer in FIG. 1 by the Web-based software, and the request routing layer transmits the above statement to respective databases A, B and C. The SQL statement is respectively performed in the databases A, B and C. For example, two rows of data with IDs 001 and 002 in dbA.SalesOrder are obtained by performing the SQL statement in the database A, one row of data with ID 002 in dbB.SalesOrder is obtained by performing the SQL statement in the database B, and a null set is obtained by performing the SQL statement in the database C. The result sets are respectively sent back to the request routing layer after each of the databases performs the above query. Then, the request routing layer aggregates the result set of the query returned from the databases A and B, and a final query result is obtained by getting a union set of these sets in the present example. Finally, the request routing layer returns the query result to the Web-based software to display the final result to the tenant T1. However, as described above, this method involves complicated programming logics, and needs each database to perform query, so the query performance is low.
Moreover, functions such as AVG or the like cannot be used in SQL query statement by using the related art. This is because, in each database, an average can be calculated for the data stored in the database by using the functions such as AVG or the like, and data stored in another database cannot be accounted. For example, in the case of FIG. 3, if it is desired to obtain average price of orders of both tenants T1 and T3, SQL statement is generated as follows: SELECT AVG (Price) FROM SalesOrder WHERE Tenant=‘T1’ OR Tenant=‘T3’. An average price of the order of the tenant T1 and an average price of the order of the tenant T3 are returned by respectively performing the statement in the databases A and B, but an average price of orders of both tenants T1 and T3 cannot be obtained. Thus, functions such as AVG cannot be used in this case. Herein, function AVG is just an example of functions, which can also include, but not be limited to, COUNT, MAX, MIN, SUM and self-defined functions and so on.
In recent years, a technique called as federated database has been developed. LAN, computer and mainframe existing as independent systems comprised in department or division, of many organizations, has respective databases. When an enterprise computing platform is built to interconnect an organization, heterogeneous database systems distributed in the organization should be combined into a federated database so as to provide access to data for multiple users. A software layer is provided by using a middle-ware or an environment such as distributed computer environment (DCE), through which a user can interoperate with various systems. Using the federated database technique, one SQL statement can be used to query data in a plurality of data sources. These data sources can be various, which can be either relational database or non-relational database, e.g., Excel of Microsoft, xml, etc.
Federated view is a view in the federated database, and its basic table is in remote data sources. The basic table is quoted in accordance with nickname instead of table name of data source in the federated view. Data are retrieved from the remote data sources when being queried from the federated view.
The federated database and the federated view technique can be applied to the SaaS environment. For example, a federated view (FV) can be built in an underlying database A in the SaaS database system to which the federated database technique is applied, which is a view federating tables with the same structure contained in a plurality of underlying databases. A requester can be provided with transparent cross-database query service by performing query using the federated view via the database in which the federated view is built.
In addition, there still exists a problem in the existing SaaS database system. Since the number of tenants are increasing, a service provider accommodates new tenants through scaling out (by increasing new databases) when a database cannot accommodate more tenant data of tenants. Generally, the load of a database accommodating more tenants is relatively heavy, while the load of a database accommodating fewer tenants is relatively light. Thus, a mechanism is further needed to balance loads of respective databases in a database system.