1. Field of the Invention
A method of font recognition that includes recognizing Arabic and Farsi fonts using a nearest neighbor classifier, and a computer-implemented method and system using the same.
2. Background
Over the last years, considerable improvement has been achieved in the area of Arabic text recognition, whereas optical font recognition (OFR) for Arabic texts has not been studied as extensively as OCR despite of its importance in improving the recognition accuracy. See Amor, N. B., & Amara, N. E. B, “A hybrid approach for Multifont Arabic Characters Recognition,” In Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, vol. 2006, pp. 194-198, 2006; F. Slimane, S. Kanoun, H. El Abed, A. M. Alimi, R. Ingold, and J. Hennebert, “ICDAR 2011-Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text,” 2011 International Conference on Document Analysis and Recognition, pp. 1449-1453, September 2011; and M. Zahedi and S. Eslami, “Farsi/Arabic optical font recognition using SIFT features,” Procedia Computer Science, vol. 3, pp. 1055-1059, January 2011, each incorporated herein by reference in its entirety.
Optical Font Recognition (OFR) is the process of recognizing the font of a given text image. Identifying the font style involves determining the font typeface, size, weight, and slant of the printed text. Font recognition is useful to improve the text recognition phase in terms of recognition accuracy and time. Recognizing the font before using OCR helps in using mono-font recognition system that results in better recognition rates (compared with omni-font) and less recognition time. In addition, recognizing the text font enables the system to produce not only the text but also the font and style and the examined document, resulting in more savings in time compared to manual human editing where the writer needs to recover the font and styles of text.
Each font can be characterized by the following attributes (See S. Öztürk, B. Sankur, and A. Abak, “Font clustering and classification in document images,” In EUPSICO 2000: European signal processing conference, pp. 881-884, 2000, incorporated herein by reference in its entirety):
Font family: the type of font like Tahoma, Traditional Arabic . . . etc.
Size: the size of characters.
Weight: It is the thickness of the character outlines relative to their height. It can be normal or bold.
Slant: Orientation of the letter main stroke. Letter can be Roman or Italic.
OFR can be combined with OCR using one of three approaches: priori, posteriori, and Cooperative/hybrid. See H. Shi and T. Pavlidis, “Font recognition and contextual processing for more accurate text recognition,” Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 1, pp. 39-44, 1997; A. Zramdini and R. Ingold, “Optical Font Recognition Using Typographical Features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 877-882, 1998; and I. Chaker and M. Harti, “Recognition of Arabic Characters and Fonts,” International Journal of Engineering Science, vol. 2, no. 10, pp. 5959-5969, 2010, each incorporated herein by reference in its entirety. In the priori approach, the font is identified before characters recognition, whereas posteriori approach depends on the contents of the text to identify the font. A Cooperative approach combines priori and posteriori approaches. See A. Zramdini, “Study of optical font recognition based on global typographical features,” University of Fribourg, Phd Theses, 1995, incorporated herein by reference in its entirety.
Arabic language is spoken and used in Arabic countries in addition to the majority of Islamic countries (e.g. Malaysia and Indonesia) that read and write Arabic scriptures. Moreover, some West African languages such as Hausa and non-Semitic languages like Malay, Farsi, and Urdu use Arabic characters for writing.
Arabic language consists of 28 characters. Due to the cursive nature of Arabic language, most of its characters adopt several shapes based on their word location. Moreover, Arabic characters may take different shapes based on the font of those characters. For Arabic and Farsi languages, there are more than 450 fonts available. See F. Slimane, S. Kanoun, A. M. Alimi, R. Ingold, and J. Hennebert, “Gaussian Mixture Models for Arabic Font Recognition,” 2010 20th International Conference on Pattern Recognition, pp. 2174-2177, August 2010, incorporated herein by reference in its entirety. This vast variety of fonts renders the task of recognizing the font type a challenging task. Font recognition may be an important preprocessing step in an Optical Character Recognition (OCR) system. In such case, if the font type is recognized, then a mono-font OCR is used.
OCR systems can be divided into two categories: Mono-font and Omni-font systems. Mono-font OCR systems have higher accuracy since it assumes a prior knowledge of the used font, whereas Omni-font OCR systems can recognize characters of already trained fonts using a base of font models. Omni-font OCR have lower accuracy because it deals with documents written by a number of fonts.
The aim of OFR is to recognize the font based on features that are extracted from text images. Similar to other pattern recognition approaches, OFR consists of three main stages: preprocessing, features extraction, and classification. The preprocessing stage involves preparing the input image for subsequent stages by applying de-noising, normalizing, text segmentation, skew correction, and image-format conversion techniques of the input image. Then the pre-processed image is transformed into feature vectors in the feature extraction stage. This representation contains discrete information which is used in the classification stage to recognize the font styles. See X. Jiang, “Feature extraction for image recognition and computer vision,” In Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on, pp. 1-15, IEEE, 2009, incorporated herein by reference in its entirety.
The preprocessing stage includes several tasks that are initially performed to produce a an enhanced version of the original image for feature extraction. See H. Izakian, S. A. Monadjemi, B. T. Ladani, and K. Zamanifar, “Multi-Font Farsi/Arabic Isolated Character Recognition Using Chain Codes,” World Academy of Science, Engineering and Technology, vol. 43, pp. 67-70, 2008, incorporated herein by reference in its entirety. Poor or low-resolution scanning can instill in document images much undesirable information such as noise, skew, etc. Since the feature extraction phase is typically sensitive to these properties, they can affect its performance and hence degrade the accuracy of the OFR system. See B. Bataineh, S. Norul, H. Sheikh, and K. Omar, “Arabic Calligraphy Recognition Based on Binarization methods and Degraded Images,” vol. 3, no. June, 2011, incorporated herein by reference in its entirety. Therefore, several enhancement operations on the image are needed prior to the feature extraction phase such as binarization, de-noising, skew correction, segmentation, normalization . . . etc. Such enhancement processes are required to enhance the image before the feature extraction phase.
Binarization involves converting the text image from grayscale to binary image. A binary image is a digital image that has only two intensity values (0 and 1) for each pixel, which are displayed as black (text) and white (background), respectively. Researchers commonly use a thresholding method for image binarization. See Y. Pourasad, H. Hassibi, and A. Ghorbani, “Farsi Font Recognition Using Holes of Letters and Horizontal Projection Profile,” Innovative Computing Technology, pp. 235-243, 2011; Y. Pourasad, H. Hassibi, and A. Ghorbani, “Farsi Font Recognition in Document Images Using PPH Features,” nobel.gen.tr, vol. 5, no. 3, pp. 17-20, 2011; and A. Borji and M. Hamidi, “Support Vector Machine for Persian Font Recognition,” Engineering and Technology, vol. 2, no. 3, pp. 10-13, 2007, each incorporated herein by reference in their entirety. Otsu technique is commonly used to binarize the input image as it automatically estimates the suitable threshold level. Otsu's thresholding method is based on the shape of the histogram. See N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Transactions on Systems, Man and Cybernetics, vol. 9, no. 1, p. 62, 66, January 1979, incorporated herein by reference in its entirety. This method assumes that the image contains bi-model histograms (foreground and background). It finds the threshold that minimizes the weighted sum of within-group variances for the two groups that result from separating the gray tones at the threshold.
Bataineh et al. proposed a binarization method based on adaptive thresholding and a fixed window size. See Bataineh, Bilal, Siti N H S Abdullah, K. Omar, and M. Faidzul. “Adaptive Thresholding Methods for Documents Image binarization,” In Pattern Recognition, pp. 230-239. Springer Berlin Heidelberg, 2011, incorporated herein by reference in its entirety. They compared their proposed method with three other binarization methods (viz. Niblack, Sauvola, and Nick methods). See K. Khurshid, I. Siddiqi, C. Faure, and N. Vincent, “Comparison of Niblack inspired binarization methods for ancient documents,” In IS&T/SPIE Electronic Imaging, pp. 72470U-72470U. International Society for Optics and Photonics, 2009 and J. Sauvola, T. Seppanen, S. Haapakoski, and M. Pietikainen, “Adaptive document binarization,” In Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on, vol. 1, pp. 147-152. IEEE, 1997, each incorporated herein by reference in their entirety. Their binarization formula is:
            T      w        =                  M        w            -                                    M            w            2                    -                      σ            w                                                (                                          M                g                            +                              σ                w                                      )                    ⁢                      (                                          σ                fix                            +                              σ                w                                      )                                ,where Tw is the thresholding value, Mw is the mean value of the window's pixels, σw is the standard deviation of the window, and Mg is the mean value of all pixels in the image. σfix is a fixed standard deviation of the window which is computed as following:
            σ      fix        =                            σ          w                -                  σ          min                                      σ          max                -                  σ          min                      ,where σmax and σmin are the maximum and minimum standard deviation values of all windows in the image, respectively. The proposed method reported higher performance than the three other methods. However, the need for prior window size setting is a drawback of this method. See B. Bataineh, S. N. H. S. Abdullah, and K. Omar, “An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows,” Pattern Recognition Letters, vol. 32, no. 14, pp. 1805-1813, October 2011, incorporated herein by reference in its entirety. Other techniques binarized the image in the preprocessing stage without stating any details about the used binarization technique. See L. Hamami and D. Berkani, “Recognition System for Printed Multi-Font And Multi-Size Arabic Characters,” The Arabian Journal for Science and Engineering, vol. 27, no. 1, pp. 57-72, 2002, incorporated herein by reference in its entirety. Pourasad et al. used a threshold value of 1.4*K for binarizing the image where K is the threshold value obtained from Otsu global binarization method, whereas didn't perform binarization as they applied their feature extraction techniques directly on grayscale images. See H. Khosravi and E. Kabir, “Farsi font recognition based on Sobel-Roberts features,” Pattern Recognition Letters, vol. 31, no. 1, pp. 75-82, 2010, incorporated herein by reference in its entirety. Different binarization techniques are shown in more details in the binarization method column in Table 1.
Noise is a natural phenomenon which may be introduced as a result of scanning, reproduction, or digitization of the original image. See Zhang, T. Y., and Ching Y. Suen, “A Fast Parallel Algorithm for Thinning Digital Patterns,” Communications of the ACM, vol. 27, no. 3, pp. 236-239, 1984. De-noising is needed to enhance the image, which results in improved features and recognition rates.
Few techniques were used for de-noising the images before applying AFR. The used techniques mostly applied de-noising as part of edge detection and enhancement using derivative based operations like the Canny edge detector, the Laplacian operator. In one case the Median filter was used. Other cases assumed that the noise was removed from the images.
Hamami and Barkani used median filter to remove the limited noise from the text images. Using Median filter each point in the image is replaced by the median value of its eight neighbors. Bataineh et al. in applied Laplacian filter to detect edges and remove noise. Chaker et al. removed unwanted noise during the edge detection phase using Canny edge detector. See J. Canny, “A computational approach to edge detection.,” IEEE transactions on pattern analysis and machine intelligence, vol. 8, no. 6, pp. 679-98, June 1986, incorporated herein by reference in its entirety. This detector smoothes the images by convolving it with a Gaussian filter. Ben Amor et al. removed the noise in the preprocessing phase without stating their used technique. Pourasad et al. removed the noise and performed the necessary corrections manually by using photo-editing software. Moreover, Zahedi and Eslami assumed that their SIFT technique is flexible against small noise. See M. Zahedi and S. Eslami, “Farsi/Arabic optical font recognition using SIFT features,” Procedia Computer Science, vol. 3, pp. 1055-1059, January 2011, incorporated herein by reference in its entirety.
Table 1 lists the used de-noising technique by each recognition technique. It is clear from the table that researchers commonly used the Laplacian filter for noise removal. Other techniques assumed that noise was removed at the preprocessing stage without stating their used technique. See I. S. Abuhaiba, “Arabic Font Recognition Using Decision Trees Built From Common Words,” Journal of Computing and Information Technology, vol. 13, no. 3, pp. 211-224, 2005 and S. Ben Moussa, A. Zahour, A. Benabdelhafid, and A. M. Alimi, “New features using fractal multi-dimensions for generalized Arabic font recognition,” Pattern Recognition Letters, vol. 31, no. 5, pp. 361-371, April 2010, each included herein by reference in their entirety.
Image skew may be introduced during document scanning due to incorrect alignment of the scanned page and hence may cause serious problems for document analysis. See Cao, Y., Wang, S., & Li, H., “Skew Detection and Correction in Document Images Based on straight-line fitting,” Pattern Recognition Letters, 24(12), pp. 1871-1879, 2003. Therefore, most OFR techniques involve skew correction in the preprocessing stage. Skew correction is usually invoked by techniques that work at the block levels, or paragraph level, whereas most of the techniques that work at the character level did not use skew correction.
Hough-based transform is more often used although it has high time complexity and gives poor results when de-skewing images include sparse text. See T. Saba, G. Sulong, and A. Rehman, “Document image analysis: issues, comparison of methods and remaining problems,” Artificial Intelligence Review, vol. 35, no. 2, pp. 101-118, November 2011 and Sun, C., & Si, D, “Skew and slant correction for document images using gradient direction,” Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on. Vol. 1. IEEE, 1997, each incorporated herein by reference in their entirety. In addition, it is used at the paragraph level, which limits its application in AFR as different font sizes and styles may be used for different text lines or even words.
The Hough transform can be used for correcting the skewed images. Each point (x, y) in the original image is mapped to all points in the (ρ, θ) Hough space of lines through (x, y) with distance ρ from the line and slope θ. Peaks in the Hough space are then used to find the dominant lines and thus the skew. The Difficulty in correcting the skew in images with sparse texts is one limitation of the Hough transform technique. Moreover it is language dependent.
The Singh technique is an additional method for skew correction. Singh technique for skew detection and correction consists of three steps. See C. Singh, N. Bhatia, and A. Kaur, “Hough transform based fast skew detection and accurate skew correction methods,” Pattern Recognition, vol. 41, no. 12, pp. 3528-3546, 2008. First step is to reduce the number of image pixels by using a modified form of block adjacent graph. The second step detects the skew by using Hough transform. Finally, the final step corrects the skew by using both forward and inverse rotation algorithms. Ben Moussa et al. resolved skewing by using Box Counting Dimension (BCD) and Dilation Counting Dimension (DCD) features which are invariant to rotation.
Skew corrections can be performed manually by using photo-editing software, or they can be performed using a scale invariant feature transform (SIFT) feature extraction technique, which is invariant to rotation. Other technique assumed that the images have already been de-skewed and the text lines were fairly horizontal.
The skew correction method column is shown in Table 1. Table 1 lists some of the techniques used in various approaches to OFR. Table 2 shows techniques that work at the character level and which do not use any skew correction technique—skewing at the character level is considered to be an intrinsic characteristic for each font and hence is needed in the feature extraction stage.
Segmentation involves dividing the input image into smaller components (sub-images). Segmentation is typically performed at one of four levels: lines, words, connected components, and characters. Character segmentation is the most difficult, particularly in Arabic text as it is cursive, and has significant effect on the recognition process.
To segment the image text into lines, it is common to use the horizontal projection method. With the horizontal projection method, peaks represent the writing lines, whereas valleys represent spaces between lines. The vertical projection method is normally used to extract the connected components of each line. In the vertical projection approach; histogram peaks are the main vertical parts of the connected components, whereas valleys are the spaces between those components.
A common method uses horizontal and vertical projections to segment lines, words/sub-words and characters. This method works well because of the simplicity of the implementation, assuming that the input images are of good quality, little or no skew and tolerable levels of noise. In real documents, this may not be the case, hence resulting in wrong segmentation. Document skew may result in problems as the projected text may not be separated by spaces and hence the technique will fail. A more robust technique is by splitting the image into vertical strips and applying the segmentation to each strip. This modification was applied in M. Sarfraz, S. Mahmoud, and Z. Rasheed, “On Skew Estimation and Correction of Text,” In Computer Graphics, Imaging and Visualisation (CGIV '07), pp. 308-313, IEEE, 2007 and M. Tanvir Parvez and S. a. Mahmoud, “Arabic handwriting recognition using structural and syntactic pattern attributes,” Pattern Recognition, vol. 46, no. 1, pp. 141-154, January 2013, each incorporated herein by reference in their entirety. Another approach is to use large blobs for finding the expected lines then add smaller components to these lines for each strip then combine the strips of lines into full lines.
In one method to segment the input text images into characters, the horizontal histogram (projection) is used to detect the text lines. Then the connected components in each line were located using vertical projection. In order to segment the connected components into characters, the beginning and end of each character were determined based on a set of pre-defined rules. The beginning of the character (starting column) is the column whose vertical histogram is greater than a threshold value. The end of the character (final column) is the column that satisfies a number of other rules. First, its top and bottom lines must be greater and less than the junction line, respectively. A junction line is a line that has the highest number of black pixels. Second, the difference between the bottom and top lines must be less than or equal to a threshold. Third, the top line of this column must be above the top line of the starting column. Fourth, the number of vertical transitions must be equal to two. Finally, the vertical histogram must be less than another threshold. Having this large number of rules and thresholds produces difficulties in accurately calculating them and are usually constrained to a certain text quality. A vertical histogram can be used for character segmentation with some variations. To segment the text image into characters, they first segmented the text image into lines by using the pixel position of the highest block. Then using the vertical histogram, the text line is segmented into characters. The beginning of the character is located through vertical histogram by finding a point where the number of its black pixels is greater than the number of the black pixels of previous points. This scanning continues until it finds a point that has a number of black pixels less than a certain ratio of the previous point. The main body of each character is considered to lie between that ending point and the beginning point. By using vertical histogram, this algorithm continues locating the end of each character and the beginning of the following character by searching for a point where the number of black pixels is greater than a certain ratio of the number of black pixels of the previous points. FIG. 1 shows the start and end points of two different characters  and .
Vertical projection is usually used to detect white spaces between successive characters for non-cursive writing or between connected components. See R. G. Casey and E. Lecolinet, “A Survey of Methods and Strategies in Character Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 7, pp. 690-706, 1996, incorporated herein by reference in its entirety. It should be noted that the character segmentation algorithms that use vertical projection fail in segmenting ligature characters (overlapped characters) like  and  as well as touching characters. FIG. 2 shows some Arabic ligature characters.
It can be assumed that the image text is already segmented into words, whereas other approaches worked at the character level, hence avoiding the difficulties associated with character segmentation. Other AFR techniques may not need segmentation at all depending on the used feature extraction technique or on the used features. See M. B. Imani, M. R. Keyvanpour, and R. Azmi, “Semi-supervised Persian font recognition,” Procedia Computer Science, vol. 3, pp. 336-342, January 2011, incorporated herein by reference in its entirety. Zehadi and Eslami used a Scale Invariant Feature Transform (SIFT) for font recognition at the paragraph level without the need for segmentation. Moreover, techniques that use global features extraction technique can work at the paragraph level or need only segment the text into lines to construct blocks of text.
Table 1 shows the different published segmentation techniques. The Segmentation method column in Table 1 states the segmentation method used by each technique. This table shows that only a few techniques addressed the segmentation at the character level, whereas other techniques are applied at the word level, the line level, or the paragraph level. Moreover, other techniques that use global features extraction technique need to segment the text into lines to construct blocks of text or it can work at the paragraph level.
Usually there are two categories of feature extraction techniques: local analysis and global analysis. Global features can be extracted easily from the whole text image or a block of texture, while local features are extracted from small units like characters and are more difficult to extract than global features. See H. Ahmed and S. Shukla, “Comparative Analysis of Global Feature Extraction Methods for Off-line Signature Recognition,” International Journal of Computer Applications, vol. 48, no. 23, pp. 15-19, July 2012, incorporated herein by reference in its entirety. Therefore, researchers utilizing global features usually normalize text images to generate a texture block that can be used in the features extraction phase. Researchers use normalization to make their techniques size invariant.
The normalization step was performed after image binarization. To construct text blocks, the spaces between words were removed first. Then, the incomplete lines were filled up. After that, a text block consisting of a number of lines (five lines) and of size 512×512 (96 dpi) is constructed for use in the feature extraction phase. Khosravi and Kabir normalized text lines with respect to their height since the same-size fonts share the same height independent of their font style. Next, they removed large whitespaces between words of the normalized lines. To construct a texture, the input line is segmented into several parts of 128 pixels each and concatenated from top to bottom into 128×128 texture bitmap. This size was selected based on the height and width of the line in an A4 document with a 100 dpi resolution. One limitation of this method is that it constraint the font recognition to only lines with width greater than 64 pixels. In addition, this technique will not work if more than one font is used in the same line. After locating the words in each line by vertical projection, Borgi and Hamidi normalized the spaces between words by scaling them to a predefined length. If the document still contains spacing, they filled it up by repeating the first line to get an image of 300×300 size. This new image is in turn divided into 25 non-overlapping blocks. This technique suffers from the same limitations of. Imani et al. applied a gridding approach to divide each texture of size 128×128 pixels into 16 sub-blocks of size 32×32 pixels each. Whereas Slimane et al. normalized the word images into 45 pixels height to be compatible with the size of the window used in their feature extraction phase. Table 1 shows that the normalization techniques used by researchers. The size of the constructed block is shown for the techniques that segmented the image into blocks to extract features.
Thinning or skeletonization can also be used. Thinning/Skeletonization algorithms normally produce spurious tails, Zig-Zag lines and small loops. See S. Mahmoud, I. Abuhaiba, and R. Green, “Skeletonization of Arabic characters using clustering based skeletonization algorithm (CBSA),” Pattern Recognition, vol. 24, no. 5, pp. 453-464, 1991, incorporated herein by reference in its entirety. The skeleton of characters can be extracted by thinning the characters to one point thickness using a thinning method proposed by Haralick in R. M. Haralick, “A Comment on ‘A Fast Parallel Algorithm for Thinning Digital Patterns’,” Communications of the ACM, vol. 29, no. 3, pp. 239-242, 1986, incorporated herein by reference in its entirety. This method consists of two stages; In the first stage, the south-east boundary points and the north-west corner points are detected, while the north-west boundary points and the south-east corner points are detected in the second stage. This technique has several disadvantages as noise is amplified, some structures are destroyed, and some digital patterns may disappear. The thinning technique column in Table 1 lists the skeletonization technique used by each approach.
Only few researchers addressed edge detection in their AFR systems. These techniques mainly used gradient operators like the Laplacian operator for edge detection. The edges of texts can be detected by applying a Laplacian filter with a 3×3 kernel matrix. The Laplacian filter values and the final output of applying it on an image are shown in FIG. 3.
A skeleton procedure can be used, after applying the Canny operator, to obtain a thin edge with one pixel width. The Canny edge detector first smoothes the image and then estimates the gradients of the image where a large magnitude indicates an edge. The gradient array is further reduced by hysteresis, which searches along the remaining pixels that have not been suppressed. Two thresholds (low and high) are used. A pixel is marked as an edge if it has a magnitude value greater than the first threshold. Moreover, any pixels connected to this edge pixels and has a value greater than the second threshold are marked as edge pixels. On other hand, pixels that have values less than the first threshold are marked as non-edge pixels. Other pixels that are between the first and second thresholds are set to zero until a path from these pixels to a pixel with value greater than the second threshold is found. See R. P. Vignesh and R. Rajendran, “Performance and Analysis of Edge detection using FPGA Implementation,” 2012, incorporated herein by reference in its entirety. Table 1 lists the Edge detection techniques used by each approach. The Laplacian filter is used by most researchers to detect edges in the preprocessing stage.
TABLE 1Recogni-Edge Imagetion BinarizationDe-noisingSkewingSegmentationThinningDetectionNormali-Block paperLevelMethodTechniqueMethodMethodTechniqueTechniquezationSizeGowely et al. (1990)Character———Proposed———Hamami et al. (2002)Character—Median filter—Proposed———Amor et al. (2006)Character———Pre-Segmented———Izakian et al (2008)Character———Pre-SegmentedZhang et al. ——techniqueChaker et al. (2010)Character—Gaussian filter—Pre-SegmentedHomotopicCanny Edge —thinningdetectorAbuhaiba (2005)Word ———Pre-Segmented———Slimane et al. (2010)Word———Pre-Segmented——VerticalPourasad et al. (2011)LineOtsuManualManualProjection———Khosravi et al. (2010)Line———Projection——Block 128 * 128constructionBataineh et al. (2011)BlockAdaptive Laplacian Singh et al. Projection—Laplacian Block 512 * 512thresholdingFiltertechniquefilterconstructionBatainch et al. (2012)BlockOtsuLaplacian Singh et al. Pre-Segmented—Laplacian Block 512 * 512FiltertechniquefilterconstructionZahedi et al. (2011)Paragraph———Pre-Segmented———Ben Moussa et al. Line and ———————(2010)ParagraphImani et al. (2011)Texture—————Block 32 * 32constructionBorji et al. (2007)TextureOstu——Projection——Block 100 * 100construction
Feature extraction is an important phase of AFR. Researchers have used many types of features. Gradient features, pixel regularity, edge regularity, Box Counting Dimension (BCD), Wavelet energy, Gabor features, and structural features like vertex angle, length holes, thickness ratio, perimeter, area, . . . etc. The used features are detailed below.
In one approach, Arabic characters and fonts are identified based on a dissimilarity index. They calculated the dissimilarity index based on its shape index as shown in FIG. 4. This index consists of Polar distance (di), Polar angle (θi), Vertex angle (ai+1), and Chord length parameters (Li) which were calculated from the polygonal representation of the character edges. After obtaining the shape index, the dissimilarity measure was calculated to recognize the character and font by comparing it against other models of characters and fonts in the database. The drawback of using polygonal approximation is its complexity, instability to geometric transformation, and little robustness.
In a second approach, the features are extracted based on the behavior of the edge pixels. This technique aims to analyze the texture of the binary image based on representing the relations between adjacent pixels. After applying text normalization and edge detection in the preprocessing stage, multiple statistical features are extracted. These features are generated from weights, homogeneity, pixel regularity, edge regularity, edge direction features, and optionally correlation. To extract such features, Edge Direction Matrix (EDM) statistical technique was used. EDM technique is based on representing the relationship between each pixel in the edge and its two neighboring pixels by applying eight neighboring kernel matrix as shown in FIG. 5(a). Then the direction angle between the scoped pixel and its eight neighboring pixels were calculated as shown in FIG. 5(b). Two levels of relationships were used; first-order and second-order. With the first-order relationship (also called EDM1), a value between 0 and 315 degrees is stored which represents the directional angle between the scoped pixel and all neighboring pixels. Then the number of occurrences is calculated for each value in EDM1. FIG. 5(a) shows the relationship between edge pixels and two neighboring pixels.
In the Second-order relationship, only one representation was used to represent each pixel. The relationship priority was then determined by arranging EDM1 values in descending order. Then the most important relationship (high-order) was taken while others were ignored. Finally, EDM2 was filled by calculating the obtained relationships that were stored in the scoped cell in EDM2 as illustrated in FIG. 6(b).
In one approach, Arabic font recognition can be performed using an extraction algorithm. After locating the words in the preprocessing stage, he extracted 48 features from those words. Sixteen features were extracted using horizontal projections of the word image and the following equation:h(y)=ΣNI(x,y),y=0,1, . . . ,N-1,where N is the word height after normalization. Then 1-D Walsh discrete transform of the horizontal projections h(y) was used to find 16 Walsh coefficients using the following equation:w(u)=1/N+Σy=0N-1h(y)∥i=0n-1(−1)bi(y)bn-1-i(u), where N=2n and bk(z) is the kth bit in the binary representation of z. In addition to that, he used other features (viz. 7 invariant moments, width, height, thinness ratio, perimeter, area, x and y coordinates of area center, aspect ratio, and direction of axis of the least second moment).
Arabic fonts can be recognized by a technique in which fifty one features are used in a Gaussian Mixture Model (GMM) using a sliding window technique for features. The sliding window technique helps in extracting features without the need to segment the words into characters. Their used features were the number of connected black and white components, ratio between them, vertical position of the smallest black component, the sum of the perimeter of all components divided by the perimeter of the analysis window, compactness, gravity, log of baseline position, the vertical position of baseline, number of extrema in vertical and horizontal projections, and the vertical and horizontal projection after resizing the window. Ben Moussa et al. used fractal dimension approach for font recognition. To estimate the fractal dimension, they used two estimation methods: Box Counting Dimension (BCD) and Dilation Counting Dimension (DCD). BCD is used to cover the texture distribution in two-dimensional images, while DCD is used to cover vision aspects. They used BCD and DCD with different box sizes and radiuses. BCD of sizes 15 and 20, and DCD of radiuses 15 and 20 were the extracted features.
In one approach Arabic and Farsi fonts can be recognized using the Sobel-Robert's Features (SRF). These features were based on combining Sobel and Robert gradients in 16 directions to represent the directional information of the texture. Sobel operators use the information of the 8 neighbors to obtain the horizontal and vertical gradients, while Robert's operator use the information of the 4 neighbors to get diagonal gradients. To extract these features, text blocks of size 128×128 were constructed. Then each input block was divided into 16 sub-blocks (4×4 windows) of size 32×32 each. For each pixel in each sub-block, they computed the gradient values using the Sobel operator and extracted both gradient phase and magnitude. The phase was then quantized into 16 angles from 0 to 30π/16. This results in 16 features, which correspond to 16 phases for each sub-block, and 256 (16×16) features for the whole block. Similarly, the Robert's operator was computed to give 256 additional features. Sobel and Roberts then were concatenated to form a 512 feature vector for each text block. Due to differences in the range of Sobel and Robert's features, both features were normalized separately to unit magnitude before concatenation and the result of the concatenation was called the Sobel-Robert's features (SRF), which are later normalized to unit magnitude as well. One disadvantage of this technique is that it cannot recognize the fonts in a line that contains more than one font. See Y. Pourasad, H. Hassibi, and A. Ghorbani, “Farsi Font Face Recognition in Letter Level,” Procedia Technology, vol. 1, pp. 378-384, January 2012, incorporated herein by reference in its entirety.
In M. Zahedi and S. Eslami, “Farsi/Arabic optical font recognition using SIFT features,” Procedia Computer Science, vol. 3, pp. 1055-1059, January 2011, reserachers used the scale invariant feature transform (SIFT) to recognize Farsi fonts. The main function of SIFT is to detect and describe key points of objects in images that is used to identify objects. See D. G. Lowe, “Object Recognition From Local Scale-Invariant Features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, pp. 1150-1157 vol. 2, 1999, incorporated herein by reference in its entirety. The key feature of this method is robustness to mild distortions, noise, illumination and changes in image scale. To extract features (key points) using the SIFT method, a staged filtering approach was used. In the first stage, Gaussian scale-space function filters out a set of key locations and scales which are recognizable in different views of the same object. Then to locate stable key points, the difference of Gaussian (DoG) function was calculated by finding the difference between two images; one is of ‘k’ times scale the other. This stage would identify key locations by looking for the extreme points resulting from applying DoG. Poorly located and low contrast points on the edges were not used in the next filtering stage. The derived SIFT points were then stored and indexed in the database. Computation time especially for large datasets is one drawback of this technique, so they proposed using Speed Up Robust Features (SURF) that is inspired by SIFT and requires computation time.
A feature extraction technique can be based on wavelets. To obtain a feature vector from each sub-block (text image was divided into 16 sub-blocks with 32×32 size for each block) a combination of wavelet energy and wavelet packet energy features were used. The wavelet energy is the sum of square of the detailed wavelet coefficients in vertical, horizontal, and diagonal directions. The wavelet energy for an image of size N×N in horizontal, vertical, and diagonal directions at the i-level were calculated respectively as follows:
      E    i    h    =            ∑              x        =        1            N        ⁢                  ⁢                  ∑                  y          =          1                N            ⁢                          ⁢                        (                                    H              i                        ⁡                          (                              x                ,                y                            )                                )                2            
      E    i    v    =            ∑              x        =        1            N        ⁢                  ⁢                  ∑                  y          =          1                N            ⁢                        (                                    V              i                        ⁡                          (                              x                ,                y                            )                                )                2            
      E    i    d    =            ∑              x        =        1            N        ⁢                  ⁢                  ∑                  y          =          1                N            ⁢                        (                                    D              i                        ⁡                          (                              x                ,                y                            )                                )                2            
The value of wavelet energy in all levels (Eih, Eiv, Eid)i=1, 2, . . . , K where k is the total wavelet decomposition forms the wavelet energy feature vector. After decomposing the high-frequency components, wavelet packet transform constructs a tree-structured multiband extension of the wavelet transform. The average energy was calculated after decomposing the image and extracting the related wavelet packet coefficients as follows:
      E    =                  1                  N          *          N                    ⁢                        ∑                      i            =            1                    N                ⁢                                  ⁢                              ∑                          j              =              1                        N                    ⁢                                          ⁢                                    [                              s                ⁡                                  (                                      i                    ,                    j                                    )                                            ]                        2                                ,where s(i,j) is the wavelet coefficient of a feature sub image in N×N window centered at pixel (i,j).
Features can be extracted base on texture analysis by using multichannel Gabor filtering and gray scale co-occurrence matrices. For example, twenty-four Gabor channels can be used. To extract features, all 24 filters were applied for each block (9 non-overlapping blocks for each image). Then another image was derived by taking the maximum of these filter responses per pixel. To represent texture features, the mean value and standard deviations of the channel output image (over each block) were chosen which formed a 50-dimensional feature vector extracted from each block.
To recognize Farsi fonts and sizes, two types of features can be used. For example, one feature can be related to the letters' holes, whereas the second is related to the horizontal projection profile. To obtain the first feature, a bounding box of holes was constructed after extracting holes of the document text. Then a histogram of box size was obtained which was considered as a feature. Second type of features was extracted from the horizontal projection profile of text lines. These features consisted of the height of the text line, distance between top of the text line and the baseline, distance between bottom of the text line and baseline, location of the second or third maximum of horizontal projection profile related to the location of the baseline. Table 2 lists features used in several references.
TABLE 2PaperFeaturesAbuhaiba (2005)Width, Height, Thinness ratio, Perimeter, Area, x and y coordinates of area center,Aspect ratio, Invariant moments (7), Direction of axis of least second moment,Walsh coefficients (16), and horizontal projection features (16).Borji et al. (2007)Mean and standard deviation of 24 Gabor (8 orientations with 3 wavelengths)Chaker et al. (2010)Polar distance, Polar angle, Vertex angle, and Chord length polygonal attributes ofcharacter edges.Ben Moussa et al. (2010)Box Counting Dimension (BCD) with two sizes: 15 and 20, and Dilation Counting Dimension (DCD) with two radiuses: 15 and 20Pourasad et al. (2010)One feature is related to letters' holes, while other features which are related to arePourasad et al. (2011)related to the horizontal projection profile are height of text line, distance betweentop of text line and base line, distance between bottom of text line and base line,location of second or third maximum of horizontal projection profile related to thelocation of base line.Slimane et al. (2010)The number of connected black and white components, ratio between them,vertical position of the smallest black component, the sum of perimeter of allcomponents divided by the perimeter of the analysis window, compactness,gravity, log of baseline position, the vertical position of baseline, number ofextrema in vertical and horizontal projections, and the vertical and horizontalprojection after resizing the window used for features extractionKhosravi et al. (2010)A combination of Sobel and Robert gradients in 16 directionsBataineh et al. (2011) Weights, homogeneity, pixel regularity, edge regularity, edge direction, andBataineh et al. (2012) correlation.Zahedi et al. (2011)key pointsImani et al. (2011)Wavelet energy and Wavelet packet energy
Font recognition is the final phase of an AFR system. Extracted features from the feature extraction phase are provided into the recognizer to identify the font type, style, etc.
Researchers used different feature types in the feature extraction phase, various numbers of fonts in the training and testing phases, and different databases. These differences, especially in the used data, make it inappropriate to compare the identification rates. The different data is justified by the lack of a benchmarking database for Arabic font recognition. Researchers have also differed in the used classification technique. They used K-nearest neighbor, decision trees, neural networks, support vector machines and Gaussian mixtures; just to name a few.
In I. Chaker and M. Harti, “Recognition of Arabic Characters and Fonts,” International Journal of Engineering Science, vol. 2, no. 10, pp. 5959-5969, 2010, Chaker et al. recognized the font type against other font models in the database using Polar distance, Polar angle, Vertex angle, and Chord length polygonal attributes of character edges features. By finding the minimum dissimilarity measure, the characterized character was classified to one of ten fonts. 100% recognition rate is reported for this technique on a dataset of 10 Arabic fonts and 360 characters for testing. The used dataset is limited which is considered as a limitation of this technique. Moreover, the complexity, the stability, and robustness are problems with polygonal approximation methods. See I. Debled-Rennesson, “Multiorder polygonal approximation of digital curves,” Electronic Letters on Computer Vision and Image Analysis, vol. 5, no. 2, pp. 98-110, 2005, incorporated herein by reference in its entirety. Furthermore, recognizing fonts that are morphologically similar like Arabic Transparent and Simplified Arabic is a more challenge task and may result in lower recognition rates.
In B. Bataineh, S. Norul, H. Sheikh, and K. Omar, “Arabic Calligraphy Recognition Based on Binarization methods and Degraded Images,” vol. 3, no. June, 2011, Bataineh et al. proposed a technique to recognize of Arabic calligraphy fonts based on 22 statistical features (viz. Weights, homogeneity, pixel regularity, edge regularity, edge direction, and correlation). To identify one of the seven Arabic calligraphy types, they used a back-propagation neural network (BPNN). This classifier was used with 22, 18, and 7 nodes in the input, hidden, and output layers, respectively. To evaluate the proposed technique, two types of experiments were conducted. The first experiment was to compare the performance of the text normalization based on the proposed binarization method with three other methods (viz. Niblack, Sauvola, and Nick methods), while the second experiment evaluated the performance of the texture features effectiveness and the accuracy of the recognition phase. A dataset of fourteen Arabic degraded document images were used for their experiments. The first experiment reported higher performance for the proposed binarization method (92.8%) than the other three methods, while the accuracy rate of the second experiment was 43.7%. The problem with the proposed method is the need for prior window size setting. Moreover, 43.7% accuracy is too low and the dataset is limited. Bataineh et al. proposed a technique to classify the Arabic calligraphies into one of seven fonts using weights, homogeneity, pixel regularity, edge regularity, and edge direction features. To evaluate this technique, they compared their technique with the Gray Level Co-occurrence Matrix (GLCM) technique developed by Haralick et al. using Bayes network, Multilayer Network and Decision Tree classifiers. See R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural Features for Image Classification,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 3, no. 6, pp. 610-621, November 1973, incorporated herein by reference in its entirety. These experiments were conducted on a dataset of seven fonts consisting of 420 samples for training and 280 samples for testing. The reported experimental results showed that this method obtained higher performance (95.34%) with Multilayer Network classifier whereas GLCM rate is (87.14%) with the same classifier. Moreover, the proposed technique reported an accuracy of 92.47% and 97.85% using Bayes network and Decision Tree classifiers respectively, whereas the GLCM technique reported 77.85% and 85.71% using the same classifiers. Their database of 700 samples for seven fonts is limited.
In Bataineh, B., Abdullah, S. N. H. S., & Omar, K., “A novel statistical feature extraction method for textual images: Optical font recognition,” Expert Systems with Applications, vol. 39, no. 5, pp. 5470-5477, April 2012., Bataineh et al. tested their feature extraction method based on the relationship between edge pixels on the image using five different classifiers. The used classifiers were decision table rules, artificial immune systems (AIS), multilayer neural networks, decision trees, and Bayesian networks. Based on the experimental results, a decision tree classifier was chosen as the best classifier to be used with the proposed technique. To evaluate this method, comparison with gray-level co-occurrence matrix (GLCM) method was reported on a dataset consisting of seven fonts and 100 image samples for each font. Using decision tree, the proposed method obtained higher rate of (98.01%) than the GLCM method (86.11%).
A decision tree classifier can be used to classify the samples into one of three fonts. For example, using 48 features with 72000 samples for training and 36000 samples for testing, a recognition rate of 90.8% has been reported. See, I. S. Abuhaiba, “Arabic Font Recognition Using Decision Trees Built From Common Words,” Journal of Computing and Information Technology, vol. 13, no. 3, pp. 211-224, 2005. The number of fonts is limited and the recognition rate is not suitable for practical applications.
In F. Slimane, S. Kanoun, A. M. Alimi, R. Ingold, and J. Hennebert, “Gaussian Mixture Models for Arabic Font Recognition,” 2010 20th International Conference on Pattern Recognition, pp. 2174-2177, August 2010., Slimane et al. used Gaussian Mixture Model (GMM) with fifty one features. To extract the features, sliding window technique was used. They used Expectation-Maximization (EM) algorithm with 2048 Gaussian mixtures. To evaluate their approach, they used a dataset consisting of 10 fonts and 10 sizes from the APTI database. See F. Slimane, R. Ingold, S. Kanoun, A. Alimi, and J. Hennebert, “A New Arabic Printed Text Image Database and Evaluation Protocols,” In 10th International Conference on Document Analysis and Recognition, pp. 946-950, 2009, incorporated herein by reference in its entirety. A total of 100,000 training and 100,000 testing samples were used in the experiments (1000 samples for each font size). With 2048 mixtures, a 99.1% recognition rate was reported. Shifting the constructed window to extract features by one pixel is considered time consuming.
Using BCD and DCD methods to estimate the fractal dimensions, Ben Moussa et al. used K-nearest neighbor classifier. To evaluate the proposed technique, two experiments were conducted; one for recognizing Arabic fonts, while the other for recognizing Latin fonts. A dataset consisting of 1000 block images of ten fonts and three sizes were used for the first experiment. They reported 96.6% recognition rate. For recognizing Latin fonts, a database of 800 block images were used and a 99.3% recognition rate was obtained.
In Y. Pourasad, H. Hassibi, and A. Ghorbani, “Farsi Font Recognition Using Holes of Letters and Horizontal Projection Profile,” Innovative Computing Technology, pp. 235-243, 2011., Pourasad et al. used horizontal projection profile and holes of letters on seven fonts and seven sizes. Two datasets of 490 and 110 images were used in the experiments. They reported a 93.7% recognition rate. The database size is limited and the recognition rate is not suitable for practical applications. Alternatively a multi-layer Perceptron (MLP) classifier can be used with 40 hidden neurons to identify the font of the text lines based on Sobel and Roberts features (SRF). This technique requires much less computation time (3.78 ms) than an 8-channels Gabor technique (78 ms). A database consisting of 500 document images (20,000 line images) and ten Farsi fonts with sizes of 11-16 was used. After comparing the features with Gabor filters, they claimed that the new features are faster than an 8-channel Gabor filter by fifty times. By using the new features, a 94.16% recognition rate and a 14% improvement over the 8-channel Gabor filter (80%) was realized. A recognition rate of (94.16%) is low for practical applications. This technique cannot recognize the font types in lines that contain more than one font type.
In M. Zahedi and S. Eslami, “Farsi/Arabic optical font recognition using SIFT features,” Procedia Computer Science, vol. 3, pp. 1055-1059, January 2011., Zahedi and Eslami in proposed another technique to recognize the Farsi fonts by using scale invariant feature transform (SIFT) method. They recognized the fonts based on the similarity between objects in the tested images and the extracted key points. See D. G. Lowe, “Object Recognition From Local Scale-Invariant Features,” In Proceedings of the Seventh IEEE International Conference on Computer Vision, pp. 1150-1157 vol. 2, 1999, incorporated herein by reference in its entirety. To recognize the fonts in the test image, the features (key points) are extracted from the image and compared to a database of extracted key points to find the best set of matched key points. These points were used to find the best match from the database by using the nearest neighbor classifier. A least square-based method was used in the model verification stage to verify each group of features. Then, the least square-solution was performed again on the residual points to filter out outlier points. A match was identified as a correct recognition if a set of three or more points agreed on the models' parameters. They evaluated their technique over a dataset with 75 document images for testing for 20 font types. They claimed to achieve 100% recognition rate. Their database of 75 text images is limited in size. Furthermore, choosing fonts for testing that are morphologically similar like Arabic Transparent and Simplified Arabic is more challenging than their selected fonts. Moreover, the computation time especially for a large datasets is another drawback of this technique, that is why they proposed using Speed Up Robust Features (SURF) (inspired by SIFT) that has less computation time.
In M. B. Imani, M. R. Keyvanpour, and R. Azmi, “Semi-supervised Persian font recognition,” Procedia Computer Science, vol. 3, pp. 336-342, January 2011., Imani et al. used SVM, RBFNN, and KNN classifiers in a majority vote approach to classify data to reliable and unreliable classes. By using this approach, unlabeled data is classified if two of the three classifiers agree on one font type. However, if each classifier predicts a different label, then the data will remain unlabeled and unreliable. This process is repeated iteratively by retraining the algorithm with the newly-labeled data and used it to classify the unreliable data. SVM and RBF classifiers were then used to classify the test data by using the labeled reliable data that resulted from the previous steps. A 95% recognition rate was reported. See F. Slimane, S. Kanoun, J. Hennebert, A. M. Alimi, and R. Ingold, “A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution,” Pattern Recognition Letters, vol. 34, no. 2, pp. 209-218, January 2013, incorporated herein by reference in its entirety. In A. Borji and M. Hamidi, “Support Vector Machine for Persian Font Recognition,” Engineering and Technology, vol. 2, no. 3, pp. 10-13, 2007, Borji and Hamidi proposed a method to extract 50 features that represent the texture of the text. They used global texture analysis and Gabor filters for feature extraction. Then two classifiers were applied: Weighted Euclidean distance and SVM. To evaluate their technique, a dataset of seven fonts and four styles was used. The reported average recognition rates were 85% with Weighted Euclidean distance and 82% with SVM. The recognition rates are too low for practical applications and the number of fonts and styles are limited. Table 3 shows the dataset used by each technique in addition to the reported recognition rates and the used classifier.
TABLE 3PaperLanguageFontsTraining datasetTesting datasetRecognition rateclassifierAbuhaiba (2005)Arabic372,000 word images 36,000 word images90.8%Decision treeBorji et al. (2007)Persian782% (SVM)SVM and WED85% (WED)Chaker et al. (2010)Arabic10360 characters100%—Ben Moussa et al. (2010) Arabic10500 block images500 block images96.6%K-nearest neighborSlimane et al. (2010)Arabic10100,000 word images100,000 word images99.1%Gaussian ModelKhosravi et al. (2010)Farsi1015,000 line images 5,000 line images94.16%MLPBataineh et al. (2011)Arabic714 images43.7%BPNNBataineh et al. (2011)Arabic7420 Block images280 Block images97.85%Decision TreeZahedi et al. (2011)Farsi/2020 paragraph imagesTesting: 75 images.100%K-Nearest NeighborArabicValidation: 1400 imagesPourasad et al. (2011)Farsi7245 images600 images93.7%—Imani et al. (2011)Persian104500 block images500 block images95%SVM, RBFNN, KNNBataineh et al. (2012)Arabic7700 images700 images98.008%Decision tree
An Arabic font database is required to test method of Arabic font recognition. Therefore, databases used in Arabic and Farsi fonts identification are reviewed and presented here. Also presented here is the design and implementation of King Fahd University Arabic Font Database (KAFD). The text in the King Fahd University Arabic Font Database is collected from different subjects: history, medicine, sport, politicization . . . etc. The database consists of twenty Arabic fonts which consists of 1,181,835 text images. This database is of multi-resolution, multi-font, multi-size, and multi-style text. It consists of text at the page and line levels.
The KAFD consists of texts printed in different fonts, different sizes, weights, and slants. There are more than 450 fonts for Arabic and Farsi. This variety of fonts makes the task of font recognition more challenging. This challenge is due to the lack of a database that contains large number of fonts. So building a database that contains many fonts is important for Omni-font recognition.
Each font should consist of several sizes, weights, and slants in a benchmarking database. The reason is that most of the documents in real life may have more than one size in the same paragraph/page and may have more than one style in the same line. Therefore, the number of fonts, sizes, styles, . . . etc are important for a benchmarking database for Omni-font character recognition.
Since there is no benchmarking Arabic font database, researchers used their own datasets. These datasets are limited in the number of fonts, styles, and scanning resolutions. Such limitations in the datasets resulted in the limitations of the outcomes of the research. The KAFD database addresses this limitation by introducing a multi-font, multi-style, multi-resolution text database.
The databases used by researchers for Arabic/Farsi font identification are developed by them and are normally not available to other researchers. Moreover, some of these databases are limited in the number of fonts or the size.
The main two databases that are freely available and contain more fonts are the APTI and ALPH-REGIM databases. The details of these databases follow.
The Arabic Printed Text Image (APTI) database is a synthesized multi-font, multi-size, and multi-style database. It is a word level database where each text image consists of only one word. The APTI database was created with a lexicon of 113,284 Arabic words. It consists of 10 fonts, 10 sizes (6, 7, 8, 9, 10, 12, 14, 16, 18, and 24 points), and four styles (Plain, Bold, Italic, and combination of Bold and Italic). Its images are low resolution “72 dot/inch” and contains 45,313,600 word images. This dataset consists of six sets; five of them are available for researchers. Table 4 lists the ten fonts used in APTI database and samples of the database are shown in FIG. 7. APTI dataset was used by Slimane et al.
ALPH-REGIM database is a paragraph level database created by Ben Moussa et al. It consists of more than 5000 text images of 14 Arabic fonts with a resolution of 200 dpi, containing both printed and handwritten scripts for Arabic and Latin languages. Fourteen fonts were used with Arabic printed texts and eight with Latin texts. The fourteen Arabic fonts are listed in Table 5. FIG. 8 shows samples of ALPH-PERGIM database. In contrast to APTI database, some of the used fonts in this database are not common in Arabic documents like Ahsa and Dammam. In addition, this database lacks the ground truth of the images.
TABLE 41Deco Type Thuluth2Andalus3Deco Type Naskh4Arabic Transparent5Diwani Letter6Simplified Arabic7M Unicode Sara8Advertising Bold9Traditional Arabic10Tahoma
TABLE 51 Deco Type Thuluth2 Andalus3 Deco Type Naskh4 Arabic Transparent5 Diwani Letter6 Kharj7 Al-Hada8Dammam9Buryidah10Koufi11Badr12Ahsa13Hijaz14Khoubar
Based on the overview of the available Arabic multi-font databases, the main limitations of the databases are summarized in the following:                1. The number of fonts for the available databases is limited.        2. Only one resolution is used.        3. No page level database is available.        4. The text in APTI database is identical for each font.        5. ALPH-REGIM lacks the ground truth of the text which is essential for document analysis and classification.        6. ALPH-REGIM is a single size database.        7. ALPH-REGIM does not contain different styles of each font.        8. APTI consists of synthesized text.        9. The fonts of ALPH-REGIM database are rarely used in books, magazines . . . etc.        10. The 6 and 7 point's sizes in APTI database are rarely used in Arabic documents.        
The KAFD database is available in different resolutions (200 dpi, 300 dpi, and 600 dpi) and in two forms (Page and Line). The developed database consists of twenty fonts as listed in Table 6. Each font in this dataset contains unique text. For each font, ten font sizes are prepared: 8, 9, 10, 11, 12, 14, 16, 18, 20, and 24 points. For each font size, four font styles are prepared: Normal, Bold, Italic, and a combination of Bold and Italic. The KAFD database is organized into three sets: Training, Testing, and Validation sets.
TABLE 61.AGA Kaleelah Regular ( ) 11. Courier New ( )2.Akhbar ( )12. Diwani Letter ( )3.Al-Qairwan ( )13. FreeHand ( )4.Al-Mohand ( )14. M Unicode Sara ( )5.Arabic Typesetting   15. Microsoft Uighur6.Arabswell ( )16. Motken Unicode Hor ( )7.Arial ( )17.Segore UI ( )8.Arial Unicode MS ( )18. Simplified Arabic ( )9.Arabic Transparent ( )19. Times New Roman ( )10. Deco Type Naskh ( )20. Traditional Arabic ( )
In order to generate the KAFD database the following five stages were conducted:—
1. Text collection
2. Printing
3. Scanning
4. Segmenting
5. Ground truth generation and validation.
In this stage, Arabic texts are collected from different subjects like Islamic, medicine, science, history . . . etc. The used texts cover all the shapes of Arabic characters. In addition, it contains Names, Quran, Places and Cities, numbers . . . etc.
The Arabic text that is used for each font in this database is different (unique) from the texts used in other fonts. In addition to that, the Training, Testing, and Validation sets are disjoint
After collecting the texts, the twenty fonts were constructed as follows: —                1. The most frequent fonts in Arabic books, Magazines, Letters, Theses . . . etc. were selected        2. Each font consists of ten sizes (8, 9, 10, 11, 12, 14, 16, 18, 20, and 24 points). The sizes were selected based on the most used sizes in Books, Magazines, Letters, Theses . . . etc.        3. For each size, four font styles are used (viz. Normal, Bold, Italic, and Bold Italic). These styles are almost all the styles that are used in Arabic documents.        4. For each font style, three categories of pages are constructed (Training, Testing, and Validation sets).        5. The number of printed pages in each category is as following:—                    1. Training:                            a. Sizes (8, 9, 10, 11, 12): Between 6 pages and 13 pages based on the font size.                b. Sizes (14, 16, 18, 20, 24): 12 pages                                    2. Testing:                            a. Sizes (8, 9, 10, 11, 12): Between 2 pages and 6 pages based on the font size.                b. Sizes (14, 16, 18, 20, 24): 4 pages                                    3. Validation:                            a. Sizes (8, 9, 10, 11, 12): Between 2 pages and 6 pages based on the font size.                b. Sizes (14, 16, 18, 20, 24): 4 pages                                                
The above sizes and styles cover the most frequently used fonts in Arabic documents, books, magazines . . . etc. FIG. 9 shows the structure of the developed Arabic fonts dataset. The database consists of three resolutions (200 dpi, 300 dpi, and 600 dpi), for each resolution, text images at the page and line levels are available.
The Arabic fonts database is printed using HP Laser jet 600 M601 Printer and with a print resolution 1200×1200 dpi. Each font consists of 14,490 printed pages as shown in Table 7.
TABLE 7SNFontNumber of printed pages1Freehand7282Courier New7043Arabic Transparent7284Al-Qairwan7245Traditional Arabic7216Deco Type Naskh7357Microsoft Uighur6998Times New Roman7159Arial Unicode MS73510Simplified Arabic73811Arabic Typesetting71612Arial70313AGA Kaleelah73714Al-Mohand72115Diwani Letter72016Segore UI70817Arabswell73018Motken Unicode Hor72419M Unicode Sara73020Tahoma736Total number of printed pages 14,452
As stated the previously, ten pages of size 8 in each font (6 Training, 2 Testing, and 2 Validation) are printed. The same text is used to print other sizes which resulted in the increase of the number of pages. Twenty pages (12 Training, 4 Testing, and 4 Validation) are printed for text size larger than 12 points. Therefore, the total number of printed pages is 14,452 as shown in Table 7.
The texts of Arabic fonts database are scanned using scanner at different resolutions. Scanner machine of type Ricoh IS760D is used for scanning. Pages are scanned in grayscale. They are scanned in 200 dpi, 300 dpi and 600 dpi resolutions. Each page is scanned and saved as a “tif” image file with a name that reflects the image font type, size, style, resolution, and page number (and line number for line level database). This process resulted in 43,356 page level images for all resolutions (14,452 page images per resolution). Table 8 shows the number of page images for each font size in three resolutions (200 dpi, 300 dpi, and 600 dpi).
All database pages are segmented into lines and ground truth files for each page and lines are built. Segmentation enables the researchers to use the Arabic Fonts database at the line level in addition to page level. This stage resulted in (1,138,479) line images (379,493 line images per resolution). Table 9 shows the number of line images for each font size with three resolutions (200 dpi, 300 dpi, and 600 dpi).
The truth values of the page and line images of the database (KAFD) are kept in text files. Similar names to the page and line images and their truth values are used. Table 10 shows the number of letters in each font.
TABLE 8SizesNumber ofS.NFont08091011121416182024TotalresolutionsTotal1Freehand40566476928080808080728321842Courier New4052567284808080808070432,1123Arabic Transparent4056647692808080808072832,1844Al-Qairwan4056647688808080808072432,1725Traditional Arabic4455637584808080808072132,1636Deco Type Naskh4056647796828080808073532,2057Microsoft Uighu4053577278798080808069932,0978Times New Roman4255607581828080808071532,1459Arial Unicode MS4358678087808080808073532,20510Simplified Arabic4656667692828080808073832,21411Arabic Typesetting4056607684808080808071632,14812Arial4053607080808080808070332,10913AGA Kaleelah4056698095797880808073732,21114Al-Mohand4053647688808080808072132,16315Diwani Letter4056607688808080808072032,16016Segore UI4252607282808080808070832,12417Arabswell4056628092808080808073032,19018Motken Unicode Hor4056647688808080808072432,17219M Unicode Sara4056687690808080808073032,19020Tahoma4656667890808080808073632,208Total82311031258151517511604159816001600160014452343,356
TABLE 9Num-ber ofSizesreso-S.NFont08091011121416182024TotallutionsTotal1Freehand2,2122,3102,3522,3962,4811,7351,4921,3421,1761,02818,524355,5722Courier New2,5132,6042,7282,8642,9402,2391,9881,7681,5881,34922,581367,7433Arabic 2,8243,0243,1183,2143,3302,3181,9981,7601,5941,36024,540373,620Transparent4Al-Qairwan2,0542,1802,2522,3182,3771,6721,4961,3361,18695217,821353,4635Traditional 1,9962,0342,1142,1402,1881,6581,4361,3041,18198817,039351,117Arabic6Deco Type 1,5781,6561,7041,7561,8121,3261,12096087671213,500340,500Naskh7Microsoft 1,8981,9302,0102,0912,1191,6421,4661,3011,19197216,620349,860Uighu8Times New 5,5642,6452,7342,7582,8442,1571,8921,7341,5361,28822,152366,456Roman9Arial Unicode 2,5982,7422,7762,8122,7971,9721,7161,5141,3561,12021,403364,209MS10Simplified 1,8861,9762,0262,0752,1321,5331,3391,1781,09688116,122348,366Arabic11Arabic 2,5602,7682,7652,7422,9462,1401,9161,7101,5741,28122,402367,206Typesetting12Arial2,4672,5302,6642,7022,7912,1401,9061,7261,5241,26421,714365,14213AGA Kaleelah2,1202,2262,2822,3642,5071,7881,5151,4381,2721,04518,557355,67114Al-Mohand2,0602,1812,2992,3692,4511,7551,5861,4381,2721,04518,456355,36815Diwani Letter1,6021,6901,7381,7961,8541,2761,1121,03687271613,692341,07616Segore UI2,2382,2932,3832,4582,5501,9701,7281,5631,4181,19119,792359,37617Arabswell1,8061,9061,9261,9882,0621,4321,2701,1081,02087215,390346,17018Motken 2,1342,2162,2622,3152,3421,6961,4801,3401,1561,01417,955353,865Unicode Hor19M Unicode 2,1702,3002,3522,4102,4921,7641,5201,3421,1901,03618,576355,728Sara20Tahoma2,6822,7502,8322,9183,0562,1501,8261,6701,5041,26922,657367,971Total43,96045,96147,31748,48650,07136,36331,80228,56825,58221,383379,49331,138,479
TABLE 10Hamza-Hmaza-Tild-Under- Above-Above-Taaa-HamzaAlifAlifAlifAlifBaaClosedTaaaThaaJiimHaaaXaaDaalThaalTaaZaaySiinShiin76483176882056852160340070332561646972020320254364701615648565962659674336627662232127684928241312146003079222645324052264698641048820832258161290441024127847148079203537614776826851000431452523642444108084121856167348179285585647640230969754820104128332142486793624292825640259622696490041900722649201212582415636409683852421564753921973690652976852016190766776365412194884961219447884867440111924176963348441796191887481219188103096136924623220796113262854201960259162187093844459965109215188322784090223904569442691611167013546545842353011928490890252246655630841218909750213976224504501266429621080832342643416168620242690122407867524278682198450936229285452931641339682264838216493721970477204194641191241621255056247048516383612205805530821727330481812121148144443351236092294287577223776941369360428521867670962953161348035644178867468545529435214280285003394814976574201565286572126643977615672132646801702928262556260013180612810619107831540549067615436196110030247161980203578282384422521088437212421628614962960102256572488934817408326163907629140698362484812481291685366019288949637647626112649122792115036463448046416464265124462022000692603054899596110245334020496646437794025980442441456631288880013489213436428243638825032727122123271460860446904171681446432820826844583001224960045669677888173803575642016238406384429072110996131366050426228664833897618828453961864742526615295116147803295637228222805921616908994681101246980190889096279372155643125215485471672772906121252022928342921401655424140847108013448312281427210790320592239087042624548376440932105090120483927452340192686044826070101000132665017222540137043254883160871840496096916367128117615704406443747224088602483475610591218800456402458453123107121427639876145666712577009397613340276123908815996612161844084236113444385216764181,6167,430,176443,7041,051,83646,4721,709,3161,414,2242,124,642337,752715,236864,076433,3481,378,180451,3242,107,664269,5121,039,736421,048Hamza-Above-Alif-Hamza-Above-SaadDaadThaaaTaaAynGhaynFaaGaafKaafLaamMiimNuunHaaWaawWaawBrokenYaaAlif Broken171641072011316796470284769678888445245392026724412146416625682824124088254820264149076118881369688481384842725885673284220834056321681896961015688850445792867041296148081157204456295681899238592556413704080687311667664421563920802042961219327624015995235402706422317216780227361238426080622011813647607027262252399803189721780481111086956813448453282267618642812228228201297623036589286484111046343253624494362870401539641195607478012798034482260417136098442341617540244802922581181353662850549583164623018611471811918689012150324288612456150558446625802239923219681461082781795484166653785840236407019095216192211779418912619583079223176011308246281444429324650885424115886732053768513923220041694961261567852413958423122566418858499922318417668335646220985288944692444861241244322904157404110068607961208002728268441634041046816316109802364047806364086525148440248396202198841207169728857748111028195219420130880615235510321284540287701232463368296038664128099047211021529418855493848221052673834570291330259422316413024175525592902008984688164656052796279032153336143336979321494243360237561652327000244201605217356612410072014456823645053260156318904196188156544133636175680263626544154988125082351212544197927008101796502058444513043625627306017731695612640161141925360540019210485762374817916237324584746441613680444566565192828211214780413450010872815914019681979217117654002287212296177685156758761113658716434244327626229214429612484871580131260280020612164264707616160101282543650005231279085158437340338482052281029848135240200900761952144521295527960240461459818792925485952101667817456686646542749081475021326309186413007240202302415254682042184014120183526808811161624089896603726160028799215672414946412318013548415682787217072064961968013460190245632666649776496084226845028230900124916109988591881237282196201161391006688454,282304,810479,282122,4161,737,314233,1341,377,0641,036,638970,4965,800,6183,078,9862,538,8081,637,2502,774,17860,594438,7303,441,954193,432
The statistics of the KAFD database follow below.
Arabic font database consists of 43,356 page images. Page-level database is presented in three resolutions (200 dpi, 300 dpi, and 600 dpi), 20 fonts (Table 6), 10 sizes (8, 9, 10, 11, 12, 14, 16, 18, 20, and 24 points) and 4 styles (Normal, Bold, Italic, and combination of Bold and Italic). Table 9 shows the number of page images for each font. Table 10 shows the total number of pages, lines, words, and characters of KAFD database. FIG. 10 shows a page level image of KAFD database.
TABLE 10Number of page Number ofNumber ofimagesline images characters14,452379,49349,099,848Number of resolutions333Total43,3561,138,479147,299,544
Arabic font database consists of (1,138,479) line images. This part of the database is presented in three resolutions (200 dpi, 300 dpi, and 600 dpi), 20 fonts (Table 6), 10 sizes (8, 9, 10, 11, 12, 14, 16, 18, 20, and 24 points) and 4 styles (Normal, Bold, Italic, and combination of Bold and Italic). Table 9 shows the number of line images for each font. Table 10 shows the total number of pages, lines, words, and characters of KAFD database. FIG. 11 shows samples of line level images of KAFD database.
APTI database is used for comparison with KAFD database. The APTI database is a word level database consisting of 10 fonts, 10 sizes, and 4 font-styles. Table 11 shows a comparison between the two databases. APTI database consists of only 10 fonts while KAFD database consists of 20 fonts. APTI has one resolution (72 dpi), whereas Arabic font database is scanned with three resolutions (200 dpi, 300 dpi, and 600 dpi). APTI is available only at the word level while KAFD database is available in two forms (page and line). The number of APTI images is greater than KAFD database because it is a word images while KAFD database is page and line images. Finally, APTI text images contain synthesized text, whereas KAFD database is scanned real text.
TABLE 11Evaluation criteriaKAFDAPTINumber of fonts2010Number of sizes1010Number of styles44Resolutions200 dpi - 300 dpi - 600 dpi72 dpiDatabase levelsPage - LineWordTotal number of images1,181,835 45,313,600 (Page and line images)(Word images)Number of characters147,299,544259,312,000Scanning methodScannerSynthesized
The lack of a benchmarking multi-font Arabic database makes the task of developing Arabic font and text recognition more difficult. Furthermore, comparing the accuracy of the techniques developed by researchers without a benchmarking database is in-appropriate. A review of the multi-font Arabic databases is presented. Then, a description of the database (viz. KAFD) is presented. KAFD is a free database available for researchers in three resolutions (200 dpi, 300 dpi, and 600 dpi) and two levels (page-level and line-level). It is a multi-font, multi-size, and multi-style database. It consists of 20 fonts, 10 sizes, and 4 font-styles. Moreover, it is available at the page and line levels and consists of (1,181,835) text images.