Most written languages in the world such as Latin (or Cyrillic or Greek) text are written in a direction from left to right (LTR). However, some other written languages such as Arabic, Hebrew, Urdu, and Farsi (Persian) are written in a direction from right to left (RTL). When a text includes both LTR text segments and RTL text segments, each type of text should be written in its own direction, thus forming a bi-directional text, also known as “BIDI”. A computer system having a BIDI support capability can display texts of different languages on a same page, even if the languages have different text directionalities.
However, BIDI rules are very complex, and the rules implemented by different software are usually not unified. Indeed, a same script can contain two or more kinds of texts having different writing directions, or texts having different writing directions can refer to each other or even refer in a multi-layer way, or a BIDI document can contain special texts such as dates, numbers, formulae etc.
BIDI data stored on legacy systems (e.g. mainframes systems) used to be in “visual” layout: the data were stored in memory like how they are shown on displays (usually terminals or printers). This had the advantage that no special processing was needed to format the data for presentation, since it was already in presentation form. Since the data only existed on the same platform, it did not matter what form was used. With the advent of processing power closer to end users, the new personal computer systems now mainly store the BIDI data in a logical way. This means that the data is stored in memory in the order they are typed, not how they are displayed. This has the advantage that BIDI data can be processed as non-BIDI data (i.e. searching, sorting and parsing can be done using same modules used with non-BIDI data). In order to display the BIDI data, the system may render the data for presentation which is usually done using BIDI Layout Engines (for text environment) or BIDI Layout Engines embedded in font (for Graphical Environment). Since the data only exists on the personal computer, it does not generally matter what form is used.
Certain text processing algorithms, like search and sort algorithms, differ according to the text type and orientation. The text orientation also called “base direction, “Global orientation”, “writing order”, “reading order” or “paragraph orientation”, determines the side of the screen, window, page, or field where the rendering engine starts laying out directional segments. The next segments progress in the direction of the global orientation. If a bidirectional text has been created in storage with the intent to be presented in a right-to-left global orientation, and is instead rendered with a left-to-right global orientation, the relative order of the different segments (and of the punctuation) gets mixed up and the text does not make sense. The text orientation therefore determines the flow of words writing or displaying and may be either LTR or RTL.
Text type and text orientation are generally defined manually by the system administrator or application user for bi-directional data exchanges between different systems relying on different BIDI layout. To define the text type and orientation, a GUI may be added to the application to allow users to input the text type and text orientation. A GUI may be required to change the methods parameters of the application (APIs) to define the text type and orientation attributes, which can be costly, and cumbersome. Further, in certain situations, manual configuration is not even possible (e.g. there are many sources and it is difficult to configure BIDI layout for each of them, or all source text is received from a specific queue). In such situations, the text may be corrupted.
Text type and orientation are also required for text processing such as text display, text search, text sorting, etc. If the text type and orientation are not known for a bidirectional text, the text processing cannot be performed correctly.
FIG. 1A shows a table T1 representing an exemplary memory buffer storing a text according to Visual text type and RTL text orientation. In visual RTL text type and orientation, the text is displayed in the same order of the characters order.
The memory buffer of table T1 may be displayed in the same character sequences because the text type is visual and orientation is RTL, but when displaying the above memory buffer in a logical environment, the buffer may be corrupted and displayed according to the sequence represented in table T2.
Indeed, the segment “English text” is written in reversed letter sequence in the representation of table T2 because the environment cannot detect the text type and orientation which leads to incorrect segment display.