1. Technical Field
The present invention relates to maintaining databases, and in particular to maintaining database records where the contents of individual records in a source database are changed from time to time.
2. Description of the Related Art
With most commercial databases, a new record is created each time new data is added to the database. For example, a newspaper database almost never removes or modifies a record relating to an existing article. Instead, when a new article is published or the same article is republished at a later time with modifications, a new record is added for that new article.
There are a few databases in which the data in individual records are modified from time to time. For example, the Derwent World Patent Index (WPI) and INPADOC patent databases and the Dun and Bradstreet (xe2x80x9cDandBxe2x80x9d) and Thomas Register business databases have one record relating to each particular subject. For example, the DandB and Thomas Register databases maintain a record for each company. The Derwent and INPADOC databases maintain records for each patent family.
New records are added for completely new items, e.g., new businesses or patents families, but when information needs to be updated in these databases, the data in the individual fields within a record are modified. For example, when a new patent issues in a previously reported patent family, new international patent classification codes are assigned, new cross-references are identified or new priority information made public, or a patent family is re-assigned, a Derwent record will be updated to add the appropriate information to the necessary fields. Likewise, changes in address, officers, financial information or ownership may result in changes in fields within a record in a DandB financial database.
With such systems, the user may want to know how current the information is for a particular record. Therefore, the record or the database normally also includes a last modified date or some similar indication of the last time at which the record or the database was updated.
A common use for these types of databases is a current awareness search. For example, a scientific researcher may want to keep current on new patents, or a mail-order company many want to keep current on new addresses and companies in particular fields.
To conduct a current awareness search, a user of the database creates and saves a search. That search then is reapplied to the database periodically, e.g., weekly or monthly. The results of the search then are forwarded to the user for review.
Such current awareness searches normally search only the portion of the database that is new since the prior current awareness search, either by using a subset of the database that only includes the new information (such as a weekly update tape) or by filtering the results based on the last modified date information. Searching this way means that the search will pick up both new records and records which have been modified since the last current awareness search.
Unfortunately, this means that the user often sees essentially the same information repeatedly each time the current awareness search is run. To use the patent databases as an example, the database records are updated each time a new patent in the family is published or issued. A patent might be filed in 10 to 20 countries. Most countries outside of the United States publish the patent application at 18 months, and then again publish a notice when the patent is granted. This means an individual record may be updated 20 to 40 times for a particular family. The recipient of the current awareness search therefore will see essentially the same information with only minor changes 20 to 40 times.
Similarly, updates to business records may be as minor as changes in lower level officers, slight shifts in reported stock ownership or similar relatively minor pieces of information. Again, every time the information is updated, the user will see a new report.
In either situation, seeing essentially the same reports over and over is usually annoying. It is particularly annoying when the record is of no interest to the reviewer, but simply happens to be caught in the net of the terms used in the current awareness search.
In addition to the annoyance to the user, if a local database is being created to store results of these searches, it will become extremely large if each record is stored multiple times. A primary function of such a local database is to allow local searching of a smaller database. But a search against the local database will result in whole series of duplicative records containing what is overwhelmingly the same information, with minor changes from one record to the next.
The present invention avoids these problems in the prior art by using the unique record identifier or accession number that normally is present in these types of databases to limit repetition in what a reviewer sees and what is added to a local database.
Data may be received from source databases in a variety of ways. For example, a local database might be created by conducting a retrospective search of the entire source database, or an individual involved in the project for which the local database is being created may have a list of patents they have collected over time which they want included. It may then be maintained by running a current awareness search on a regular basis and by adding records which individuals involved in the project learn about through daily reading or interactions with others in their field.
Whatever the source of data, according to the present invention the first time data is added to the local database, at least one reviewer reviews each of the records and determines whether each record is of interest (accepts the record) or of no interest (rejects the record). The review may take place before or after the record is actually added to the database, depending on the situation, with the record deleted from the database if it is rejected in a subsequent review. Lists of the record identifiers of all records reviewed, and of all rejected records then are maintained.
On subsequent addition of data to the database, new records with record identifiers matching record identifiers of those on the previously reviewed list can automatically be processed without further interaction by a reviewer. If the record identifiers are on the rejected list, they will be ignored, i.e., not added to the database (or deleted from the database, if the records were all added before the review). If the record identifiers are not on the rejected list, then the record in the local database is updated with the revised information. In any case, the reviewer is not requested to accept or reject the updated versions of previously reviewed records again, but updated versions are available in the local database when local database users conduct a separate search of the local database.
In some situations, multiple reviewers will review particular sets of data. If so, preferably a record will be added to the reject list only if all reviewers reject it, or if a specific reviewer rejects it, or if not all reviewers accept it, depending on what is appropriate for the particular situation and local database. For example, a database of reference materials for a development team might add patent records to the reject list only if all reviewers reject the record, while a database of records of patents which are to be assigned to a third party as part of a sale of a business might add patent records to the reject list if any reviewer rejected the record.
Preferably, the reject list can be blanked and restarted if desired. For example, a database of reference materials may be created for a development team. The scope or focus of the team then changes. A new retrospective search is run, and future current awareness searches are modified. In this situation, what records are relevant may change, so the reject list is no longer relevant, and should be restarted.
Sometimes a reviewer may actually want to be notified if certain information in a record changes. According to a further aspect of the invention, the system can be arranged to check certain fields in each record. Updated records then will be passed through to the reviewer if the information in the specific fields is changed. For example, a reviewer might not care about who the current assistant secretary is in a company in a DandB report, but might be very interested in knowing about updated information on sales and operating income. The system therefore could be set up to check the sales and operating income fields, and pass the record on to the reviewer if those change, but merely to update the record in the local database if other fields change. Again, the reviewer will receive only information that is either new or of particular interest, but still have quick access to current information in the local database, when desired.
If significant fields are being checked, ideally an accept/reject decision list is maintained for each reviewer. Records which then have been rejected by a particular reviewer will not be sent to the reviewer even if significant fields are updated, but they will be sent to other reviewers who accepted those records.
It will also be appreciated that the local database now contains only a single, current copy of records which a reviewer has accepted. This means that searches of the local database by the reviewer or some other user will only produce a single copy of the current information for each record. Moreover, the local database now only contains records which at the least have not been rejected by some reviewer, plus, possibly, new records from the most recent lot of data being added which have not yet been reviewed. Searches run against the local database therefore will have minimal amounts of spurious records returned due to flukes in the scope of the search.