1. Field of the Invention
The present invention relates to a data processing apparatus for generating an image file including a plurality of images obtained by reading a plurality of documents.
2. Description of the Related Art
A function of a scanner, a printer, and a multifunction peripheral (MFP) apparatus integrated with a scanner function has been diversified. One of them includes a function of reading a document by a scanner to generate a data file of the read image. This generated data file is stored in a memory card attached to the MFP apparatus, an external storage medium such as a universal serial bus (USB) memory, or an external storage apparatus such as a personal computer (PC).
A file format of a data file to be generated like this includes, for example, a Portable Document Format (PDF) file, a Joint Photographic Experts Group (JPEG) file, and a Tag Image Tile Format (TIFF) file. For example, in the PDF file, a plurality of read images can be stored as one data file and a plurality of related images can be collected into one file and managed.
However, generation processing of a data file may be interrupted by cancellation operation or the like during reading, and the data file may be stored in an incomplete state. In this case, there is a problem that in some applications, a file is not restorable such that the contents of a data file cannot be displayed or the like.
Thus, a technique has been proposed in which file generation processing ends after a restorable file is generated even if generation processing of a data file is interrupted on the way.
For example, in a technique described in U.S. Pat. No. 6,982,811, it is confirmed whether cancellation of file generation processing is performed every time read image data of one page is generated. When cancellation is performed, a PDF file up to a page where a file has been established is generated.
In other words, since interruption of file generation processing is determined after an image of one page is read, incomplete data can be prevented from being generated.
In the above-described technique, after an image of one page is inputted, cancellation of file generation processing is determined. In other words, it is assumed that a memory, which can store image data of one document, is prepared in an apparatus.
However, an apparatus is also present, which does not have a memory capable of storing image data on such one document. In such an apparatus, the read image data is successively transmitted to an external storage medium to generate a file in the external storage medium.
Accordingly, when data file generation processing is interrupted during reading of pages, in the external storage medium, data until during reading of pages will already be added to a file.
For example, when a PDF file is generated while a plurality of images read by a scanner is successively written into a universal serial bus (USB) memory attached to an MFP apparatus, if data file generation processing is interrupted in the middle of a page, an image will be broken on the way. When such a file is provided, the PDF file cannot properly be opened.
Further, as another case, when a capacity of a USB memory is filled up during reading of documents, a writing error may occur and thereafter storage processing may end. For this reason, even if the generated image file is intended to be displayed, a file cannot properly be opened similarly to the above-described problem.
The above-described problem will be described in detail referring to FIG. 8. FIG. 8 is a diagram illustrating one example of a document file configuration. As illustrated in FIG. 8, when a document data file is roughly classified, the document data file includes a header 31, a catalog 32, a plurality of objects 33, and trailer information 35.
The header 31 contains header information that provides a key to specify the document. For example, in a PDF, the header 31 contains a character string such as “% PDF-1.4”. The catalog 32 corresponds to a table of contents of documents. The catalog 32 is defined as a cross-reference table (referred to as Xref) storing a position in a file of an object.
The object 33 corresponds to one page to n (n is an integer of 2 or more) pages. Each object contains a drawing object such as a font to be used in a document, a text, a graphic, and an image. The object includes description of each page to be drawn. For example, when an image is contained in a page, the object includes information on image data, a data size, a width, a height, and a drawing position.
The Pages object 34 is a pointer to each page. A number of page described in this Pages object 34 is recognized as the number of page of the PDF file. For example, when the PDF file is displayed, an image having the number of the page is provided for display.
The trailer information 35 stipulates the number of elements of a table of an Xref table and a start address of a cross-reference table. For a PDF file, these kinds of information are required. When data is broken on the way, the PDF file cannot be displayed.