The present invention relates to techniques for accessing databases in a computer network. More particularly, the present invention relates to improved techniques for permitting users of a client-server computer network to more efficiently access data in data repositories that are stored at the server computer.
In modern computer networks, there is often a need to work with large repositories of data. A typical data repository may contain thousands, millions, or even billions of records, each of which may contain any type of type of data (e.g., text, graphics, audio/video clips, typed data such as links among records, or any combination of computer-readable data). Note that the term "record" is employed herein in its general sense and represents, like its pre-computer counterpart, any collection of data that may be stored and tracked as a unit. Depending on the need of a particular software application program, a record may as simple as a data structure for storing a single numerical value or may be as complex as any data structure that is computer-readable. An exemplary record, known as trackpoint, may be found in a commonly owned, co-pending U.S. patent application Ser. No. 09/164,947, filed Oct. 1, 1998, now U.S. Pat. No. 6,212,549 and PCT/US98/20771 filed Oct. 1, 1998, both entitled "Trackpoint-Based Computer-Implemented Systems And Methods For Facilitating Collaborative Project Development And Communication," filed by inventors Eugene E. Bouchard, Varma Kunaparaju, Venkat R. Sriram, and Scott E. Stanelle and incorporated herein by reference.
To facilitate discussion, FIG. 1 illustrates, in a simplified manner, a data repository 100, which is shown to include six records 102, 104, 106, 108, 110, and 112. For the purposes of the example of FIG. 1, data repository 100 may represent a database for storing records of data pertaining to a project being developed collaboratively by many people, such as an automobile development project. To simplify the discussion, each of records 102-112 stores only text. As mentioned earlier, however, a record may include any combination of computer-readable data, and a typical data repository in reality may have any number of records.
Due to the size of the typical data repository, it is usually impractical to provide a copy of data repository 100 at each computer or terminal wishing to access the data repository. Accordingly, data repository 100 is typically associated with and accessible by a server computer 120, which is typically a fast computer endowed with sufficient processing, storage, cache, and I/O resources for working with a large data repository. Server computer 120 is connected to multiple client computers 122, 124, and 126 through some type of conventional computer network technology, which may be a network proprietary to a particular enterprise, a virtual private network (VPN) over a public network, or the public network itself (e.g., the internet). Although only one server computer and three client computers are depicted in FIG. 1 to simplify the discussion, a typical computer network may include any number of server computers and client computers.
Through the client computers 122, 124, and 126 and server computer 120, project participants (e.g., those indicated by reference numbers 130, 132, and 134) collaborating on a project may conduct a variety of operations vis-a-vis the data repository, including storing new data into data repository 100, modifying existing data in the records of data repository 100, and/or searching the records to obtain data that is of interest to a user. There are many ways with which a user may search for data contained in the records of data repository 100. By way of example, a user may wish to perform a full search through the content of the records to find the record or records that contain the data sought. As an example, a technician wishing to obtain the technical specification of piston rings may use a client computer to communicate with server computer 120 that he wishes to search for records that contain the term "piston ring" in their content.
A full search through a data repository is, however, a time consuming and inefficient way to find information. This is because a full search often turns up many "false hits," since it returns all records that satisfy the search criteria although many of records found may only mention the specified search term (e.g., "piston ring") in passing and may have little or no relevance to the user. The user may then have to look through the large number of records referenced by the search result to weed out irrelevant records until he finds the information needed.
To permit the user to find information more efficiently, indices that cross-reference search keys with unique record IDs may be employed. The user may search the indices to find the identity of the records that contain the information he needs. By way of example the metadata of a record, which comprises data fields characterizing the record, may be employed as search keys. The metadata themselves are typically supplied by the creator of a record to assist others in finding the record being created. Alternatively, the metadata may be computer-generated from the content of an existing record. One search key may represent, for example, keywords indicative of key concepts of a record. Another search key may represent, for example, the subject of the record.
Indices can be created for databases implementing any database implemented in accordance with any database technology, including file-based repositories, relational database management systems (RDMS), object database management systems (ODMS), or any record/file keeping technology. Since the indices is a relatively small database compared to the server data repository and the process of index building itself already filters out marginal records that may have little relevance to the search keys being indexed, index searching is a more efficient way to find information.
Using the automobile development project to facilitate discussion, the records of data repository 100 may contain data, deadline information, comments, discussions that may be relevant to the various project participants. The records in data repository 100 may be indexed with predefined search keys, e.g., keywords and subjects, to facilitate more efficient searching. Using indices 150, a user may conduct a keyword search for a particular keyword, such as "pistons," or a particular subject, such as "transmission fluid viscosity," and be furnished by server computer 120 with a list containing record references to the records that exactly or approximately satisfy the search criteria (records 106 and 108 respectively in this case). With the list of record references, the user may begin to review the actual contents of the records in the list to pinpoint the exact record needed.
Although the use of indices significantly improves the user's access to the records of data repository 100, there are disadvantages in the prior art's index searching technique when a computer network is involved. Consider, for example, the case whereby a user 130 wishes to search for records that have the term "piston" in their keyword field. To begin the search, user 130 at client computer 126 first requests a search form from server computer 120. The search form includes a keyword field into which user 130 may enter the term "piston." Client computer 126 then transmits the search form with the query to server computer 120. Using the supplied query, which is "piston" in the keyword field in this case, server computer 120 then searches through indices 150 for the records that satisfy this query. The result of the index search is then transmitted by server computer 120 to client computer 126 in the form of a list of record references. In this example, the list of record references contains record IDs for records 106, 110, 112, and 104 although a typical broad search like the above typically turns up many more records.
Upon receiving the search result, client computer 126 displays the search result to user 130. If too many records are returned in the list of record references or if the titles (or abstracts or whatever descriptive information supplied with each record in the list of record references) suggest that the search needs further refinement, user 130 may refine the search by, for example, entering more detailed keywords (e.g., "piston weight") or use Boolean operators (e.g., keyword="piston" and "specification" and subject="piston machining"), or specify a new search altogether using new keywords and/or different search keys. Again, client computer 126 sends the search form with the new query to server computer 120 to allow server computer 120 to obtain a new list of record references from indices 150 and to transmit the new search result to the user. The user may conduct many such index searches before a satisfactory search result is obtained.
At some point, user 130 selects one or more of the records presented in the list of record references for download. Client computer 126 sends the selection to server computer 120, which obtains the appropriate record or records from data repository 100 for download into client computer 126 via server computer 120.
As can be appreciated from the example above, virtually every operation desired by the user results in a request from his client computer to server computer 120 for action. By way of example, every search by user 130 results in a new query form being sent to server computer 120 to allow server computer 120 to search through indices 150 for the records that satisfy the search query. Further, the requirement that server computer 120 performs every search for every client computer creates a processing bandwidth bottleneck at server computer 120.
The processing bandwidth bottleneck may be partially alleviated by increasing the processing and/or storage capacity of server computer 120 to allow it to perform searches faster. However, even if server computer 120 could instantaneously perform the searches requested by the users of the computer network, there may still be a significant amount of time latency between the time a user enters a search query and the time the user obtains the search result. This is because there is typically a non-trivial amount of time delay associated with transmitting the query from the client computers to server computer 120, and with transmitting the index search result from server computer 120 back to the client computers.
To a certain degree, this delay may be reduced by employing faster and more expensive networking technology, which increases the speed of data transmission between the client computers and server computer 120. As computer networks become more global, however, some time latency still exists, particularly as the geographic distance between the server computer and the client computer is substantial. Further, data transmission may take place over public telecommunication networks, whose transmission speed and delay are beyond the control of the individual enterprises (e.g., the automobile company in the previous example) that are connected to the public telecommunication networks. Because of this, no amount of internal upgrading of data communication equipment would alleviate the delays associated with index searching, as it is currently implemented.
In view of the foregoing, there are desired improved techniques for performing index searches in computer networks.