1. Field of the Invention
The present invention relates to digital image processing, and particularly to an Arabic bank check analysis and zone extraction method.
2. Description of the Related Art
Page analysis and zone extraction are key areas of research in document image processing and it acts as a bridge between document preprocessing and higher level document understanding, such as logical page analysis and OCR. Bank check processing is an important application of document analysis and recognition. Nearly one hundred billion checks are processed all over the world yearly. Most of the checks are still processed manually by humans. Despite its apparent simplicity, a check is a complex document. It integrates images (check layout), pre-printed components (logos, labels of data-entry fields, etc.), as well as handwritten components (literal amounts, legal amounts, signature, date, issuing place, etc.). These fields do not have fixed positions, and their structure varies according to the countries and institutions. Due to its complexity, check processing is considered as an important research field. Arabic check processing, apart from not being researched as thoroughly as other checks, has its own challenges, and hence is less advanced compared to check processing systems of other languages.
Before recognizing the regions of interest from a check image, it is important that the check image passes through various stages of preprocessing, which mainly involves binarization, skew correction, and extraction of regions of interest from the check image. Researchers of Arabic check processing have adapted some aspects of the preprocessing and check analysis techniques of other languages.
A few researchers addressed Arabic check analysis and zone extraction, and in some cases, it was computer-aided and not fully automated. Known related art methods use mathematical morphology (MM) and Hough transformation (HT) for extracting zones of interest from Arabic checks. A horizontal filter uses a linear structuring element of one-fourth the image width. A vertical filter uses a linear structuring element of one-tenth the image height. The combined result is used to extract a bounding box of the courtesy amount. Two additional filters of an appropriate number of pixels are applied on the check image to extract the legal amount and date fields. This leads to obtaining the connected components in the remaining check image, which are color-labeled. The legal amount is identified as the component having the maximum number of pixels in the same color. The prior knowledge of the position of the legal amount in the checks was utilized. To extract the courtesy amount using the Hough transformation technique, the bounding rectangle is identified. After removing the bounding rectangle of the courtesy amount, the Hough transformation is applied on the remaining image to get the longest printed line representing the line associated with the legal amount. An estimate of the height of the writing script is used to get the legal amount. The date field is identified as the first horizontal line on the top of check image. These two techniques were tested using the 1775 Arabic checks from the CENPARMI database. Extraction rates of 98%, 95%, and 97% for courtesy amount, legal amount, and date, respectively, are reported, using the MM technique. On using the HT technique, extraction rates of 98%, 95%, and 98% for the courtesy amount, legal amount, and date, respectively, are reported. A hybrid approach of MM and HT technique has also been used, wherein broken lines of the HT technique are joined using MM by using a separation threshold of 10 pixels. Using this hybrid technique, researchers reported an extraction rate of 98.27%, 91.82%, and 99.63% for courtesy amount, legal amount and date fields, respectively. Yet there is still room for improvement with respect to successful extraction rates.
Thus, an Arabic bank check analysis and zone extraction method solving the aforementioned problems is desired.