1. Field of the Invention
This invention relates to the field of computer file structuring systems.
2. Background Art
Data storage and manipulation is an important function of a computer system. A central processing unit (CPU) of a computer system cannot perform useful services unless data can be presented to it and received from it.
One way in which data can be organized for CPU access, including access for the purposes of finding, writing, reading and erasing part or all of that data, is in the form of a "file." A file is an apparently contiguous sequence of atomic components of data known as bytes.
A byte is the smallest unit of data which can be managed directly by the hardware of a computer. A byte is composed of bits, each of which holds the binary value of 0 or 1. A byte is typically defined as eight bits.
A unit of data within a file may consist of one or more bytes. The significance of the data units in a file depends upon the number of bytes in the unit, the placement of the data unit within the file, the scheme by which the resultant pattern of bits should be interpreted, and the value represented by that interpretation.
File management facilities of a computer system provide the means by which a file can be accessed by application programs as an abstract data stream. These file management facilities include the operating system of the computer, simple file input/output functions typically supplied with computer language assemblers and compilers, persistent storage device drivers and the hardware circuits and embedded control software of computer memory systems.
A file may be considered as a continuous extent of bytes presented by the underlying computer file management facilities. According to this abstraction, the file has an address at which it begins and an unbroken length of component bytes. This is the appearance of files as commonly presented for programmer access by computer file management facilities.
Computer files may be data files or executable files. This distinction is not absolute, because it is possible for a data file to enclose an executable file (in which case it is not immediately available for execution) or to enclose executable routines which can serve as sub-parts of programs. Nevertheless, for present purposes the general distinction can be used to identify data storage as the purpose of the present invention and of the prior art which relates to the present invention.
The organization of data within a file is known as the file format. File formats are not ordinarily managed by the file management facilities of a computer, but rather are defined by programmers and are specific to a given computer program and architecture. There is no universal file format which would make the data created by one computer or one program inherently accessible to any other computer or program. Data written by one program may not be understandable by another and data written on one computer may not be readable on a different computer without reformatting of the data. The need for organization of data within files is a typical problem addressed by programmers. Various approaches have been made to the general problem of data storage and management, but current practice methods of data storage do not provide consistency of file formatting so as to allow the same format to be used for simple, direct file access and for sophisticated data management.
Prior art approaches include: (1) direct file access, (2) formatted data storage with limited data management, and (3) database management systems. In some cases the programmer must design all details of the file format and all procedures of the data management. In other cases data base management systems (DBMS) provide data management but require the programmer to learn the DBMS interface and to obey rules regarding data placement and access.
The direct file access facilities of computers generally provide only the following:
1. Creating and naming new files and renaming existing files, so that files may be accessed by name. PA1 2. Adding one or more bytes of new data to the end of a file. PA1 3. Truncating the length of a file, thereby losing the bytes beyond the new length. PA1 4. Overwriting the values of bytes in an existing file with new values. PA1 5. Copying extents of an existing file, beginning at a position, locatable as an offset from the start of the file, and continuing for a specified length, into another file by means of methods 1, 2, 3 and 4, above, and with the assistance of method 6, below. PA1 6. Erasing existing file names and contents.
The above capabilities can be used in complex ways to alter files to achieve such purposes as overwriting of bytes at an existing position in the file, addition of bytes to the end of the file, and removal of bytes from a file (closing any gap caused by the removal). These actions can effect data storage and management under simple or complex file format schemes and they are in fact used in virtually all instances of data file management.
A programmer can use all of the file management capabilities listed above, but a suitable file format must be designed and data management routines implemented to suit the needs of a particular program. Since many of these needs are repetitive, standard solutions have been developed to address well-defined situations.
Well-known file formats that are suitable for the storage of loosely-structured data for the purpose of data transfer include SDF (system data format), which represents data units as ASCII strings terminated by linefeed characters; ANSI X.12 standard, which defines several layers of data units and sub-units by a system of in-stream specific-value separator bytes and type values which key to externally documented schemas; the telegraphy standard, which uses in-stream control values to describe the rough layout of data units within the stream; and the BASIC language convention of using comma-delimited ASCII data fields, generally in the context of a schema. In this category of file formats, emphasis is upon the containment and identification of data units independent of data management facilities; thus, access to data held in such formats remains a direct file access problem.
The mere separation and identification of data units is not sufficient for circumstances in which the use as well as the character of the data is well known. In such cases there is a need to combine file formats with behaviors specific to the structured data. The prior art attempts to accomplish this in formalized data management schemes.
Structured file access schemes that include limited data management capabilities are often employed to handle data intended for defined uses. Examples are graphic image files, which must be received, displayed and erased, but which rarely need to be indexed, filtered, or summarized. The file formats for graphic images, such as TIFF, PCX, and BMP, are generally acted on by code library procedures from any of numerous suppliers, and some or all of those procedures can be included in a program. Similar observations can be made about digital audio files, instrumentation data files, and any other files that have narrowly defined uses.
Highly structured file formats and sophisticated data management techniques have been developed in prior art. These schemes are categorized as Data Base Management Systems (DBMS). These systems are characterized by large program size and rigid record schemas (field layouts). Such systems do not satisfy the needs of programs that require irregular data file structures and simple file access. More advanced systems, categorized as Object-Oriented Data Base Management Systems (OODBMS) are even larger in size and generally slower in execution, but many of them allow data to be stored as objects with definable behaviors.
A data management facility is available under the commercial name KALA that occupies a conceptual space between the low level of direct file access and the high level of database schemes. KALA appears to be an object-oriented persistent file store facility that leaves much of the data organization to the programs that employ the product.