1. Field of the Invention
The present invention generally relates to the computer composition of Nastaliq script of Urdu scripture of the languages of Punjabi, Sindhi and Pushto and the similar Ruka'ah script of the languages of Arabic and Persian. More particularly, the invention relates to the decomposition of Urdu scripture into constituent elements defined in accordance with pattern recognition principles, their digitization and storage in a computer memory. Upon command, scripture is composed in Nastaliq or similar Ruka'ah script in conformance with a set of rules governing the selection and combination of elements stored in the computer memory.
2. Description of the Prior Art
The mechanization of true Nastaliq and Ruka'ah script of the Urdu group of languages has always remained a formidable problem. Unlike the Roman character based languages in which individual characters retain their whole shape and are not joined with neighboring characters in a word, characters of the Urdu group of languages can take on a variety of cursive shapes when joined with each other. These cursive shapes called shoshas may either comprise a representative part of a whole character shape or their shape may be altogether different from a character they represent.
Shoshas join with each other and with whole characters to form a connected body of scripture called a ligature. The selection of a particular shosha of a character depends upon the neighboring characters in a ligature. Urdu characters, however, do not appear in a single horizontal line, but instead have vertical placements of their own with the vertical positioning of shoshas depending upon the neighboring elements in the ligature. Additionally, the width of characters and shoshas differ very widely, requiring their horizontal positioning to be carefully manipulated so that when these elements juxtapose to compose the Nastaliq or the similar Ruka'ah script, the ligature bodies thus formed should have no undersirable jagged joints and should appear to be drawn by a continuous ductus of a calligraphic pen.
In true Nastaliq and Ruka'ah script, the characters and shoshas have a varying thickness along their body lengths as shown in FIG. 3. This further complicates ligature formation and requires matching the thickness of the shoshas at the joints to obtain an elegant ligature.
In complex ligatures diacritic marks (generally diamond shape marks used to distinguish characters which are similar in form) frequently need to be repositioned to prevent them from clashing with each other or the body of the ligature. Horizontal spacing between ligatures and characters is variable and is determined by a separate set of rules. Horizontal overlap between ligatures and even words is common, saves space and is unambiguous to the reader.
Vowel sounds in the script are represented by a second set of diacritic marks called Araab. These diacritic marks, if used, occur either above or below their associated shosha or character depending upon the type of Araab.
All of the above complexities have historically obstructed the machine composition of Nastaliq or Ruka'ah script in its true elegance and style. Although various attempts have been made at the mechanization of the script, all of them have distorted Nastaliq and Ruka'ah script to varying extents to suit technological limitations. Initially two shapes: the whole character and a single shosha (to be used when the character occurs in the beginning or middle of a ligature) were introduced for most of the characters. This not only seriously distorted the script but also many combinations could not be implemented.
A system of differential space was introduced whereby all characters and shoshas were made to have two possible widths. But this simplification deforms the script since shoshas and characters are of many different widths. The varying thickness along the body length of character and shosha bodies was reduced so that the joints could be simplified resulting in the degradation of the script. Moreover, the shoshas and the characters were so designed that the vertical levels of the joints were fixed for ease of ligature formation. Shoshas were widened and their joints were arranged to take place on a single horizontal line to prevent diacritic marks from clashing with each other or with the ligature body. All these simplifications had a drastic effect on the elegance of Nastaliq script and the script resembled what is known as Naskh style.
Many writers of the script continue to write manually in the Nastaliq style because of its ease and elegance but are forced to read printed material in the Naskh style due to lack of Nastaliq printing. Currently, some of the severe distortions of the script have been resolved by the advent of computers and optical printing techniques. The possible shapes of shoshas have been increased and stored in the computer's memory relieving typists of having to select character shapes.
But the emulation of the true Nastaliq style could not be achieved until recently and the Naskh style appears to have become the de-facto standard in Arabic and Persian communities. However in Pakistan and India the hand calligraphed Nastaliq script is preferred over the type Naskh script which causes almost all the Urdu literature in these countries to be initially manuscripted and thereafter printed by conventional means.
In 1981 Jamil and Saiyid disclosed a scheme to mechanize Nastaliq script. With the cooperation of the Monotype Company of England, they calligraphed, digitized, and stored in a computer memory a dictionary of the sixteen thousand most commonly used fully-formed Urdu ligatures. These stored ligatures were used to form words and subsequently were printed with a laser printer. Although a significant breakthrough for printing Nastaliq script, the Jamil-Monotype machine is limited to printing only those words whose ligatures are already stored in the computer's memory. This not only requires a mammoth memory but words for which ligatures are not stored in the memory cannot be printed. This problem is especially pronounced in the printing of proper nouns and remains open-ended. Also the process is inordinately expensive and suitable only for huge printing jobs such as newspapers and mass-circulation magazines. The technique described hereinafter resolves all of the above complexities and prints true Nastaliq with very economical use of computational effort and memory.