With development of information technologies, centralized construction and Internet-oriented transformation of an operator IT system, enhancement of an integrated application of enterprise data, and wide deployment of Internet communities (such as facebook and twitter) and Internet services (such as microblog, reading, games, and e-commerce), a database system needs to process an increasingly large amount of data, and a conventional centralized database is becoming increasingly incapable of adapting to a current requirement. Therefore, a data processing manner is changing from centralization to distribution.
A distributed database system generally includes multiple independent computer systems. Each computer may be independently placed in one place, and each computer has a complete copy of a database management system and has a partial local database of the computer. Many computers located in different places are connected by using a network, so as to jointly form a complete and global large database.
As one of outstanding distributed database systems, a federated database technology provides a uniform user access interface, and shields a difference between a sub-database of a database and a partition of the database and a difference between different databases; when accessing data by using a federated database system, a user finds it as easy as accessing an actually existing database. However, because cross-database processing is involved, a federated database has problems such as a low processing speed and easily occurred lock contention and resource conflicts. Therefore, a problem that the federated database or a distributed database system similar to the federated database needs to resolve focuses on how to improve processing efficiency.
For the federated database, the key to improve the processing efficiency is to optimize a structured query language (SQL) and output an optimal execution solution. A current solution, used to optimize the SQL, of the federated database may sometimes be confronted with a situation in which data in a data table is re-hashed or is broadcast in an entire network, which causes problems such as a large network traffic, and large occupation, caused by data insertion in each data source, of a processor and a memory, thereby leading to a low data processing speed, reduction in concurrency processing performance of a system, and a response delay, so that a use requirement cannot be satisfied.