1. Field of the Invention
The present invention relates to an apparatus and system for implementing variable-length file headers, and in particular to a file header that utilizes a varying number of parameters to store meta-data about the contents of the data stored in the file.
2. Description of Related Art
Electronic files have long been used to store data used in computer applications. While at the most basic level, all electronic files contain a collection of bits and bytes, the format of the data in an electronic file may vary greatly. For instance, a simple data file may contain a number of records that are all arranged into a predefined format. In the simplest case, the format is identical for each record. For example, a simple data file may contain records that are formatted to include an integer record number field, a date field, and a 2-character text field. In this case, each record is exactly the same length, as each record has exactly the same fields, and each field has a predefined length.
Data files containing fixed-length records have limitations though, as all of the data must be in the predefined format. Because the data has to be structured into fixed format records, many types of data, such as bitmap images, cannot be stored in files comprised of fixed-length records. Therefore, alternative file formats have been developed. One widely used method of structuring data in an electronic file is to store information about the data, the “meta-data”, in a file header section of the file, while storing the data itself in a data section of the file. The meta-data in the header section typically provides information to application reading the electronic file about how to read, interpret, process or display the data stored in the data section. Typically, file formats that incorporate file headers have a predefined file header section at the beginning of the file followed by a variable-length data section. By storing the meta-data in a predefined format, it is relatively easy for an application to read and use the data by simply parsing the known format of the header to obtain the information needed to read, process or interpret the data in the data section.
The use of file headers has made it possible for data that cannot be stored in fixed-length record formats to be stored in an electronic file in a format that can be used by many applications. For example, several different platform-independent formats have been developed for the storage of bitmap image data in electronic files. Most of these file formats consist of two sections—a file header and a binary image data section, although some formats may have additional sections in the file. The header may be separated from the image data by a special control character, or the header may be defined in such a way that the application reading the file can determine where the image data is stored within the file. The header section typically contains information about the image, while the image data section contains the actual image data. BMP (Windows), PCX (PC Paintbrush), and GIF (Graphics Image Format) are all examples of image file formats that utilize file headers.
Image file headers typically define the image size, number of colors, and other information needed by an application to display the image. FIG. 1 illustrates the structure of the file header used in BMP files. As shown, each field in the header is of a fixed length, and every field must be present in the correct order for an application reading the file to properly display or use the image data, even if some of the fields in the header do not have a value. The BMP file header is always exactly 54 bytes long.
File formats with these types of predefined, fixed-length file headers are limited in many ways. Only very particular, pre-determined information or meta-data can be stored in the header. While some fields in the header may be reserved for future use, it is very difficult to change the file header after it has been defined and in use, as every application that uses the fixed format must be updated if the format is changed.
These types of fixed file header formats work well for data that does not require a large amount of meta-data, such as a simple bitmap image file. However, there are cases when it would be desirable to store varying amounts of diverse meta-data in a file header. One example of a situation where fixed-length predefined file headers are inadequate is described in co-pending U.S. patent application Ser. No.09/782,620, entitled “Method and System for Extracting Information from RFQ documents and Compressing RFQ files into a Common File Format”, filed Feb. 13, 2001, which is hereby incorporated by reference. As described in this application, the current assignee has developed a method and system of converting numerous types of electronic documents into a common compressed file type, whereby a single viewing application can be used to view any document that has been converted to the common compressed file type.
Many different types of files can be converted into a single common file type using the disclosed method and system. Because of the wide variety of information that may be in the original documents, it is difficult to define a fixed-format file header that will capture all information that may be desirable to store with the compressed data. Even if it were possible to define a fixed-format that would adequately store data for all currently known types of information, it is impossible to predict what additional types and amount of information that would be desirable to store in the future.
Thus, what is needed is a method and system for storing variable amounts and types of information in a file header.