1. Field of the Invention
The present invention relates to a database system, data retrieval method, and storage medium and, more particularly, to a technique suitably used in a retrieval system that finds out desired data from a plurality of distributed databases.
2. Description of the Related Art
As the performance of computers becomes higher in recent years, a large-scale computer such as a single main frame or the like has been replaced by a distributed system built by a plurality of workstations or personal computers in recent years. The distributed system makes development and maintenance of the system relatively easy. As an example of the distributed system, the so-called Internet is known.
In the Internet, a plurality of computers are distributed worldwide as servers or clients, and construct a single, huge database (to be abbreviated as a DB hereinafter). Text information, image information, and the like are registered in these DBs or are read out therefrom using some protocols. Not only in such Internet, but also in a system that deals with a huge volume of data, DBs tend to be distributed.
When desired information is read out from such distributed DBs, it requires very much time and labor to search all servers that manage these DBs for required information. More specifically, since the user does not know the location of information to be read out in the distributed DBs, he or she must access servers allocated in correspondence with these DBs in turn and must repeat search until he or she finds desired information.
It is impossible to retrieve required information from all the servers unless the user knows the locations (address information such as URL: Uniform Resource Locator) of all DB servers. However, the distributed DB servers constantly register or delete data, and each DB server itself is constantly connected to or disconnected from the network. Hence, it is very hard for the user to recognize all these facts and to retrieve accurate information.
In order to eliminate such inconvenience, address retrieval services called search engines are available in, e.g., the Internet. Each search engine collects URL information automatically or manually, and a required URL can be retrieved by inputting, e.g., a keyword. For example, if a search using a keyword “patent” is made, the URLs of servers relevant to “patent” are output.
However, the search engine can only retrieve the URL information of a DB server, but cannot search an RDBMS (relational DB management system) built in the server at that retrieved URL. Therefore, in order to search an RDBMS or the like, the user retrieves information of a desired server from the search engine, and then connects to the desired server on the basis of the retrieval result. Then, the user searches the DB for his or her required information using a DB retrieval method corresponding to that server.
In this way, conventionally, upon acquiring desired data, when DBs that store various kinds of data are distributed, data retrieval requires much time and labor.
Furthermore, in the RDBMS, the maximum number of columns that can be held in one table is normally limited. Hence, in an RDBMS, the maximum number of columns of which is limited to 256, when a table having 257 or more columns is created, a plurality of tables (real tables) each including 256 columns or less are generated, and are related to apparently build a database as a single table (view).
For example, single view X shown in FIG. 1 is made up of three real tables A, B, and C, which are related. More specifically, identical data is stored in key columns a1, b1, and c1 on real tables A, B, and C, and column x1 of view X is formed using these columns a1, b1, and c1 as joint keys, thus maintaining consistency among the three independent tables. That is, column x1 on view X is common to three columns a1, b1, and c1.
Also, columns a2, a3, and a4 on real table A correspond to columns x2, x3, and x4 on view X, column b2, b3, b4, and b5 on real table B to columns x5, x6, x7, and x8 on view X, and columns c2 and c3 on real table C to columns x8 and x9 on view X, respectively. Paying attention to column x8 on view X, two columns, i.e., column b5 on real table B and column 02 on real table C are related to this column. In other words, these columns b5 and c2 on real tables B and C store identical data.
A protocol for creating single view X from three real tables A, B, and C is as follows:
create view viewX (x1, x2, x3, x4, x5, x6, x7, x8, x9)
as select a1, a2, a3, a4, b2, b3, b4, b5, c3 from TableA, TableB, TableC
where a1=b1 and a1=c1 and b5=c2
However, when such DB having a plurality of real tables A, B, and C is searched for given data, the following problem is posed. That is, in a conventional DB system, since a search is made by calling all the related real tables, all real tables A, B, and C are to be searched irrespective of real table in which desired data is located, and the individual real tables are searched in turn in accordance with a search formula input by the user.
Assuming that data to be retrieved pertains to columns x8 and x9 on view X, since column x8 on view X as data common to columns b5 and c2 on real tables B and C, actual search can be completed using only real table C that corresponds to both columns x8 and x9 without using real table B. Since columns x8 and x9 on view X correspond to none of the columns on real table A, there is no need for searching real time A in practice.
More specifically, in the conventional DB system, a broad range is searched by joining real tables more than required. Such processing prolongs the DB search time, and requires a more memory area of the computer that forms the system than required, resulting in low search performance.
When the user searches the DB, all the real tables must be joined. However, since the number of columns is also limited on a view provided by an RDBMS as in a real table, a long view cannot be formed beyond the physical limitation. Therefore, upon observing the contents of a view beyond the physical limitation, the contents must be presented to the user in units of real tables or by preparing a customized application program which manages data in units of real tables.