1. Field of the Invention
The present invention relates to a database system, data retrieval method, and storage medium and, more particularly, to a technique suitably used in a retrieval system that finds out desired data from a plurality of distributed databases.
2. Description of the Related Art
As the performance of computers becomes higher in recent years, a large-scale computer such as a single main frame or the like has been replaced by a distributed system built by a plurality of workstations or personal computers in recent years. The distributed system makes development and maintenance of the system relatively easy. As an example of the distributed system, the so-called Internet is known.
In the Internet, a plurality of computers are distributed worldwide as servers or clients, and construct a single, huge database (to be abbreviated as a DB hereinafter). Text information, image information, and the like are registered in these DBs or are read out therefrom using some protocols. Not only in such Internet, but also in a system that deals with a huge volume of data, DBs tend to be distributed.
When desired information is read out from such distributed DBs, it requires very much time and labor to search all servers that manage these DBs for required information. More specifically, since the user does not know the location of information to be read out in the distributed DBs, he or she must access servers allocated in correspondence with these DBs in turn and must repeat search until he or she finds desired information.
It is impossible to retrieve required information from all the servers unless the user knows the locations (address information such as URL: Uniform Resource Locator) of all DB servers. However, the distributed DB servers constantly register or delete data, and each DB server itself is constantly connected to or disconnected from the network. Hence, it is very hard for the user to recognize all these facts and to retrieve accurate information.
In order to eliminate such inconvenience, address retrieval services called search engines are available in, e.g., the Internet. Each search engine collects URL information automatically or manually, and a required URL can be retrieved by inputting, e.g., a keyword. For example, if a search using a keyword xe2x80x9cpatentxe2x80x9d is made, the URLs of servers relevant to xe2x80x9cpatentxe2x80x9d are output.
However, the search engine can only retrieve the URL information of a DB server, but cannot search an RDBMS (relational DB management system) built in the server at that retrieved URL. Therefore, in order to search an RDBMS or the like, the user retrieves information of a desired server from the search engine, and then connects to the desired server on the basis of the retrieval result. Then, the user searches the DB for his or her required information using a DB retrieval method corresponding to that server.
In this way, conventionally, upon acquiring desired data, when DBs that store various kinds of data are distributed, data retrieval requires much time and labor.
Furthermore, in the RDBMS, the maximum number of columns that can be held in one table is normally limited. Hence, in an RDBMS, the maximum number of columns of which is limited to 256, when a table having 257 or more columns is created, a plurality of tables (real tables) each including 256 columns or less are generated, and are related to apparently build a database as a single table (view).
For example, single view X shown in FIG. 1 is made up of three real tables A, B, and C, which are related. More specifically, identical data is stored in key columns a1, b1, and c1 on real tables A, B, and C, and column x1 of view X is formed using these columns a1, b1, and c1 as joint keys, thus maintaining consistency among the three independent tables. That is, column x1 on view X is common to three columns a1, b1, and c1.
Also, columns a2, a3, and a4 on real table A correspond to columns x2, x3, and x4 on view X, column b2, b3, b4, and b5 on real table B to columns x5, x6, x7, and x8 on view X, and columns c2 and c3 on real table C to columns x8 and x9 on view X, respectively. Paying attention to column x8 on view X, two columns, i.e., column b5 on real table B and column c2 on real table C are related to this column. In other words, these columns b5 and c2 on real tables B and C store identical data.
A protocol for creating single view X from three real tables A, B, and C is as follows:
create view viewX (x1, x2, x3, x4, x5, x6, x7, x8, x9)
as select a1, a2, a3, a4, b2, b3, b4, b5, c3 from TableA, TableB, TableC
where a1=b1 and a1=c1 and b5=c2
However, when such DB having a plurality of real tables A, B, and C is searched for given data, the following problem is posed. That is, in a conventional DB system, since a search is made by calling all the related real tables, all real tables A, B, and C are to be searched irrespective of real table in which desired data is located, and the individual real tables are searched in turn in accordance with a search formula input by the user.
Assuming that data to be retrieved pertains to columns x8 and x9 on view X, since column x8 on view X has data common to columns b5 and c2 on real tables B and C, actual search can be completed using only real table C that corresponds to both columns x8 and x9 without using real table B. Since columns x8 and x9 on view X correspond to none of the columns on real table A, there is no need for searching real time A in practice.
More specifically, in the conventional DB system, a broad range is searched by joining real tables more than required. Such processing prolongs the DB search time, and requires a more memory area of the computer that forms the system than required, resulting in low search performance.
When the user searches the DB, all the real tables must be joined. However, since the number of columns is also limited on a view provided by an RDBMS as in a real table, a long view cannot be formed beyond the physical limitation. Therefore, upon observing the contents of a view beyond the physical limitation, the contents must be presented to the user in units of real tables or by preparing a customized application program which manages data in units of real tables.
It is an object of the present invention to provide a mechanism that allows the user to easily search a DB system built by distributed DBs and their servers without requiring immediate connectivity to the distributed DB servers.
It is another object of the present invention to provide a mechanism which always recognizes information pertaining to each DB stored in distributed servers, and allows the user to retrieve the latest information at the time of search as a result without exerting any extra load on the user.
It is still another object of the present invention to provide a database system which can join tables at high speed with a minimum required memory capacity in a relational database system.
It is still another object of the present invention to create a long view beyond the physical limitation on a database.
In order to achieve the above objects, a database system built by distributing one or more databases and one or more first servers which search the databases for real data on a network, comprises metadata management means for collecting metadata which pertain to real data stored in the one or more databases from the one or more first servers, and managing the collected metadata, and metadata retrieval means for extracting metadata which matches a request from a user terminal connected to the network by search in response to the request.
Note that the metadata management means and metadata retrieval means may be located in one or more second servers different from the first servers.
According to another feature of the present invention, the metadata contains at least information indicating a location of the database or the first server, and contents of real data in the database.
According to still another feature of the present invention, the user terminal comprises means for inputting a retrieval request of the metadata, means for inputting a retrieval condition upon retrieving real data on the database using a retrieval result of the metadata supplied from the metadata retrieval means, and means for transferring the input retrieval condition to the first server indicated by the extracted metadata as a retrieval request.
According to still another feature of the present invention, the system further comprises means for providing a form for inputting the retrieval condition, which form can be commonly used irrespective of the retrieval result of the metadata retrieval means.
According to still another feature of the present invention, the first server comprises means for converting the retrieval request to the database transferred from the user terminal into a format concordant with the database to be accessed.
According to still another feature of the present invention, the first server comprises metadata saving means for creating and saving metadata that pertains to the database managed by that first server, and the second server comprises means for acquiring corresponding metadata when data stored in the metadata saving means has been updated.
Note that the second server may comprise means for acquiring data stored in the metadata saving means at a predetermined time interval.
A data retrieval method according to the present invention comprises the steps of: collecting metadata that pertain to real data stored in databases distributed on a network by a second server via first servers distributed on the network, and saving the collected metadata; extracting metadata that matches a request by search of the collected metadata; inputting a retrieval condition for the database on the basis of a retrieval result of the metadata; issuing a retrieval request of the real data to the first server indicated by the extracted metadata; and retrieving, by the first server, the real data from the corresponding database in accordance with the retrieval request.
A recording medium according to the present invention records a program for making a computer implement, in a database system built by distributing on a network one or more user terminals, one or more databases, one or more first servers for searching the databases for real data, and one or more second servers for collecting metadata which pertain to real data stored in the one or more databases from the one or more first servers and managing the collected metadata: a function of collecting the metadata of the distributed databases by the second server via the first servers; a function of extracting metadata which matches a retrieval request from a user by search of the collected metadata in response to the retrieval request; a function of inputting a retrieval condition for the databases at the user terminal on the basis of a retrieval result of the metadata, and issuing a retrieval request of the real data to the first server indicated by the extracted metadata; and a function of retrieving the real data by the first server in accordance with the retrieval request.
Since the present invention is comprised of the aforementioned technical means, when a retrieval request is issued to the second server that collects and manages metadata pertaining to one or more databases, all metadata that match the request are retrieved and presented to the user. Even when a plurality of databases and first servers that manage these databases are present on a network, when a retrieval request is issued to the second server, all metadata that match the retrieval request are retrieved independently of which databases metadata pertain to. For this reason, the user can obtain all data that match his or her retrieval request from one server as long as he or she knows only the location of the second server, even if he or she does not know the locations of the distributed databases or the first servers. In this way, the second server comprises means for providing metadata to other computers, and also provides a gateway for search common to all the databases by integrating the schemata of databases on the network.
According to another feature of the present invention, since metadata contains the location of at least the database or first server, and information that represents the contents of real data in the database, the user can detect the location of the database which stores real data that matches the retrieval request, or the first server that manages the database simultaneously with retrieval of metadata. Consequently, the user need only know the location of the second server, and need not be aware of the locations of the distributed databases and first servers for managing these databases.
According to still another feature of the present invention, since the user terminal comprises means for inputting a retrieval condition upon searching for real data using the retrieval result of metadata, and means for transferring the input retrieval condition to the first server as a retrieval request, the user, who has detected the location of the database that stores desired real data on the basis of the retrieval result of the metadata, can search for real data by inputting the retrieval condition.
According to still another feature of the present invention, since the first server comprises means for converting the retrieval request for the database transferred from the user terminal into a format concordant with the database to be accessed, the user need only create and issue a retrieval condition that fulfills a standard format without taking notice of the sites of individual distributed databases, thus building a multidatabase system of a plurality of different kinds of distributed databases.
According to still another feature of the present invention, since the second server comprises the function of retrieving metadata when the first server has updated metadata or at given time intervals, metadata collected at the second server allow the user to always obtain the latest information at the time of search as a result, thus flexibly coping with changes in the system.
In another aspect of a database system according to the present invention, a database system, which searches a plurality of tables joined by a relational database, comprises table extraction means for extracting one table including columns that store data to be retrieved from a plurality of tables, and column exclusion means for excluding columns of the table extracted by the table extraction means and columns on other tables which store the same data contents as data contents of the columns on the extracted table from columns to be extracted in subsequent processing, and the tables extracted in turn by the table extraction means are joined when the processing of the table extraction means and the processing of the column exclusion means have been repeated until all the columns including data to be retrieved are analyzed.
According to another feature of the present invention, the table extraction means extracts one table including a largest number of columns which store data to be retrieved from the plurality of tables.
According to still another feature of the present invention, the system further comprises metadata management means for collecting and managing metadata which pertain to joining of the plurality of tables, and wherein the table extraction means extracts the table on the basis of the metadata stored in the metadata management means.
According to still another feature of the present invention, the system further comprises retrieval means for retrieving objects in accordance with a retrieval key, and data is retrieved from the tables which are extracted in turn and joined by the table extraction means.
In another aspect of a data retrieval method of the present invention, a method of data retrieval from a database, processing for extracting a table and processing for excluding columns including identical data upon search by joining a plurality of tables by a relational database are repeated in such a manner that one table including columns that store data to be retrieved is extracted from the plurality of tables, columns which store the same data contents as data contents of columns on the extracted table of other tables are excluded, and another table is extracted from the remaining tables, and one or more tables extracted in turn at that time are joined.
According to another feature of the present invention, upon extracting one table from the plurality of tables, one table including a largest number of columns that store data to be retrieved is extracted.
According to still another feature of the present invention, data is retrieved from the one or more joined tables.
Another aspect of a recording medium of the present invention records a program for making a computer implement: means for extracting one table including columns that store data to be retrieved from a plurality of tables upon search by joining a plurality of tables by a relational database; means for excluding columns of the extracted table and columns on other tables which store the same data contents as data contents of the columns on the extracted table from columns to be extracted in subsequent processing; and means for joining the tables extracted in turn when the processing of the two means have been repeated until all the columns including data to be retrieved are analyzed.
According to another feature of the present invention, the program makes the computer further implement retrieval means for retrieving objects in accordance with a retrieval key from the tables extracted and joined by the table extraction means.
Since the present invention is comprised of the aforementioned technical means, when columns with identical data contents are present in different tables, the columns with the identical data contents are handled as the one that belongs to one of the tables (e.g., a table having the largest number of columns to be retrieved) during the process, and all tables that contain identical columns are not always joined. Also, a table that includes no columns to be retrieved is not extracted as tables to be joined. In this manner, an unnecessarily large number of tables can be prevented from being joined.
According to another feature of the present invention, since a plurality of tables are not independently managed but are systematically managed by collecting metadata pertaining to table joining, the user can see the plurality of tables as one table although they are merely joined ones when viewed from the database system. That is, a long view beyond the physical limitations, e.g., the number of columns, of a database can be formed.