Computerized document processing includes scanning of the document and the conversion of the actual image of a document into an electronic image of the document. The scanning process generates an electronic pixel representation of the image with a density of several hundred pixels per inch. Each pixel is at least represented by a unit of information indicating whether the particular pixel is associated with a `white` or a `black` area in the document. A pixel information may include other information relating to more colors than `black` and `white` and it may include grey scale information. The pixel image of the document may be stored and processed directly or it may be converted into a compressed image which requires less space for storing the image on a storage medium such as a storage disk in a computer. Images of documents are often processed through OCR (optical character recognition) so that the contents can be converted back to coded text.
In image processing and character recognition proper orientation of the image on the document to be processed is advantageous to essential. One of the parameters to which image processing operations are sensitive is the skew of the image in the image field. The present invention provides for pre-processing of images to eliminate skew and other characteristics detrimental to many image processing operations. Besides de-skewing, the processes of the present invention provides for consistent registration, converting inverse type to normal type, eliminating dot shading, removing random specks, eliminating horizontal and vertical lines, and protecting characters during line and dot removal.
Prior art processes require images to be converted into a pixel map. Pixel maps require large amounts of memory and are slow in executing such complex processes as needed for preparing images for other processes such as character recognition, especially when using byte oriented processors.