The present invention relates generally to document processing methods and systems, and more specifically to a method and system for labeling a document with an arbitrary, image domain document label for document storage, manipulation, and retrieval.
Scanning documents for processing on a digital computer, such as a personal computer ("p.c."), a workstation, or other digital data processing resource is now routine. Furthermore, remote document storage, manipulation, and retrieval is becoming more commonplace today given the improving interfaces between computers and telecommunication devices such as fax machines. For example, a user can now "fax" a document to his computer for the purposes of storing the document on the computer, redistributing the document via the computer, etc. What ties these two different document processes together is that they both involve apparatus peripheral to the data processing resource. The present invention is concerned with facilitating the use of such peripheral apparatus, specifically the naming and referring to files stored on the data processing resource.
For purposes of the present discussion, the digital data processing resource such as the p.c., workstation, and the like will be referred to herein as a computer. Document as used herein shall be understood to mean a carrier, such as paper, for carrying markings, as well as the markings, if any, applied to the carrier. A file as used herein shall be understood to mean a collection of data, for example that representing a scanned image of a document, stored or accessible to a computer. The term electronic representation of data will be used herein, although the representation of the data (i.e., data representation) may be electronic, magnetic, optical, or other appropriate representation. Furthermore, the data may be in analog or digital format. Finally, document storage, manipulation, and retrieval will be understood to represent all actions that a user may perform on a document and its electronic representation, including those requiring communication between a peripheral apparatus and the computer. For example, this includes document scanning and transmission to the computer from a "remote" scanner, retrieving a file from the computer, transferring a document from one computer to another computer, etc. These definitions will simplify the explanation herein of the background and details of the present invention, although it will be understood that their use should not be interpreted as limiting the spirit and scope of the present invention.
Fundamentally, in order to perform any task on a document requiring communication between a peripheral apparatus and the computer, the document must be represented by data, i.e., an electronic representation of the document must be generated. Typically, the generation of an electronic representation of a document will be performed by a document scanner, which generates a description of the on/off state of the picture elements ("pixels") comprising the image, and packages the representation as a file. The form of the electronic representation may, for example, be a bitmap of the document or a coded collection of data representing the document.
Once an electronic representation of the document (hereafter referred to as an "electronic document") is generated, there must be a way of uniquely identifying it. This requirement is most commonly handled by the disk operating system resident on the computer. For convenience, virtually every disk operating systems permits, and in fact requires either the user or the computer to assign a file name to the file containing the electronic document for subsequent identification of the file. According to known document storage, manipulation and retrieval systems, the user-selected file name must be in a format which is recognizable by the computer, for example encoded text such as EBCDIC or ASCII which may be entered from a keyboard.
Electronic documents transmitted to a computer for storage and/or processing from a peripheral device are typically named at the time of transmission to or receipt by the computer in association with the task of document storage. For example, a user may enter via a keyboard attached to the sending or receiving device an encoded text name for the electronic document. Alternatively, the sending or receiving device may automatically assign an encoded text name to the electronic document according to a preestablished rule for name assignment. Typically, the task of document storage involves establishing a destination for the file in a memory media, such as a physical location on a magnetic disk, in RAM, etc., and a system identification ("system ID") of that destination. As part of the storage process, the disk operating system establishes and maintains a correspondence between the assigned file name and the system ID.
The file name, when assigned by the user, is often a mnemonic device or other label allowing a user to identify from the file name the general or specific contents of the file. When the file name is assigned by the system, it is most often a generic name such as, for example, the user's name, the name of the device from which the file was transmitted, the date and time of creation of the file, etc. Thus, a user is typically more likely to be able to identify the contents of a file when the user assigns the file name than when it is assigned by the system.
There are known systems that permit document retrieval using peripheral apparatus, such as a fax machine. One disclosure of such a system is U.S. Pat. No. 4,893,333. According to this reference, a prestored document is identified for retrieval by way of indicia imparted on the form, for example, so-called bar codes, fill-in check boxes or fill-in fields. The idea of identifying a form absent such indicia by use of appropriate image processing software is also disclosed therein. Furthermore, performing certain operations (store, retrieve, forward, etc.) on documents by way of a peripheral device, is provided when the document is capable of being identified by way of dual-tone DTMF telephone signals, as disclosed for example, in U.S. Pat. No. 4,918,722, or in the User Handbook, Verison 3.01, for the Xerox.RTM. FaxMaster 21.TM. software product.
One problem continually encountered in the art is that not all peripheral devices are accompanied by a keyboard allowing the user to enter an appropriate file name, for example for assigning a file name for file storage, accessing prestored files, etc. A typical stand-alone scanner comprises optical imaging components, software for processing images, and possibly paper document handling mechanisms. Typical facsimile devices include the above as well as a numerical keypad, but rarely include all of the keys of a full alpha-numeric keyboard. In general, present peripheral apparatus limit the ability of the user to assign a meaningful file name to files and access previously stored files.
Furthermore, when identifying pre-stored and pre-named files by way of filling in check boxes or fill-in fields, at least one check box or fill-in field must be appropriately marked for each character in the file name. This leads to time consuming and error prone document identification. For example, if check boxes are employed to identify a file, a great many such check boxes must be provided to allow identification of alphanumeric file names. If fill-in fields are employed, the processing apparatus which identifies the document must ultimately perform character recognition on the indications in the fill-in fields.
Finally, virtually every system for establishing file names requires not only that the file name be in a format which is recognizable by the computer, but that the character set used in the file name be the native character set of the computer. For example, it is generally not possible assign a file name to a file using a foreign language character set or graphics unless the processing apparatus is capable of recognizing the character set or graphics. This precludes such operations as assigning a file a file name with Kanji characters when the computer is capable of recognizing only the Latin characters set.