The present invention relates, in general, to file based databases, and more particularly to file based databases which are highly available for both read and write access.
A database for factory automation in a multi-national environment must meet a combination of requirements which are more severe than any previous database application. Such a database must be available whenever a production line is in operation, since these production lines are worldwide this calls for essentially continuous availability. Users of the database are spread throughout the world and must be able to exchange information readily, so the database must be compatible with both local and wide area networks. At the same time the database must provide tightly controlled access to sensitive design and manufacturing information. Many users must access the data simultaneously. The data and relationships between each data item are complex; typically each production line is used for several different process flows and for different design groups, each having a unique sequence which must be followed exactly. Each design group in turn may have a different process flow for each production line. Both production line engineering and design engineering must be able to change the design and manufacturing data whenever required and many of these changes affect the parts built in that production line. A typical large company has 50 design groups, 17 production lines and between 10 and 20 flows per production line. The result is a great many opportunities for error. Error elimination is extremely important since even small errors can result in large quantities of defective or scrapped production. Finally, design and manufacturing data is typically organized into files and groups of files rather than into a record or tuple structure as in commercially available databases.
One method previously used to fulfil some of these requirements was to regularly distribute copies of the master database to each user site and access those copies using a local computer system. This method provides high availability since a site can use alternative computer systems if the primary computer system is unavailable. However, the distribution process itself is unwieldy and expensive. Changes to the master database often take weeks or months to be reflected in a user's local database. Since most of these changes are generated at the local site there is always the risk of changes being lost in the updating process. With a large number of sites and users, data integrity is unmanageable. No mechanism is available to control changes in the remote database copies, nor is there any information available to tell if a particular data item has been changed simultaneously at more than one site.
The distribution related problems can be addressed by using a single central database which is accessed remotely through a computer network. However the single central database depends entirely on the availability of a single computer system and the associated network links. If any component is unavailable then the entire database is unavailable everywhere. Since operation of the production lines depends on availability of this database, lack of access can quickly cause shut down of the entire production line. Continual revisions are made to the database information as part of the manufacturing process, requiring both read and write availability at all times. Consequently, switching to a backup database when the master database is not available is not good enough. The system cannot allow writing to the backup database without endangering the data integrity, since the master database would then not match the backup database. A central database simply cannot give the level of availability required.
Apart from availability, some requirements can be met by commercially available databases such as AURICLE or SQL. Typical databases of this kind are described in the book "An introduction to Database Systems", by C. J. Date, Addison-Wesley Publishing Company, Inc., 1977, which material is incorporated herein by reference. This reference includes a data sublanguage called SEQUEL which describes the functioning of a typical commercially available database. These databases can allow access through a network server which is compatible with local and wide area networks. They can allow concurrent transactions with multiple users. Concurrent transactions can be performed on different data as well as the same data within the database. Finally the database can be restored to its original form if a transaction does not complete successfully.
Yet other requirements are met by a version control system such as described in an article entitled "RCS--A System for Version Control", by W. Tichy, published in electronic form and available through the Usenet computer network node "prep.ai.mit.edu", in the directory "/pub/gnu", which article is incorporated herein by reference. Version control systems such as RCS perform transactions on files rather than tuples or records. They can maintain incremental copies of the data, that is a trail of previous versions of the data and the changes from version to version. This capability allows a user to retrieve any earlier version of the data, even if different locations develop different versions. Typically the relationships between versions is very complex, allowing branches and parallel versions. The groups of files can be stored in a hierarchical directory structure, facilitating management of the files.
There exists a need for a hybrid database which combines the features of commercial databases and source code control systems. The hybrid database must ensure validity and integrity of the files in the database, and yet must be always available for both reading and writing. The hybrid database must be compatible with computer networks, yet must provide a high degree of security for sensitive manufacturing information.