With ever increasing access to the Internet by different communities around the world, the Internet contains an enormous amount of webpages in different languages. The increased number of users accessing or using web translations tool reflects the necessity of providing a user with adequate and high quality translations.
A parallel text is a text placed alongside its translation. Parallel text alignment is the identification of the corresponding fragments (such as sentences or portions thereof) of a source text in a translated text.
One conventional method of aligning texts is by applying heuristic rules, such as aligning sentences based on the punctuation marks and positioning of the sentences. Such method may not be sufficiently precise, for example, due to an original sentence being translated into two sentences and the positioning of the original sentence within the original text not necessarily reflecting the positioning of the corresponding translated sentence within the translated text.
Another conventional method is the use of a pre-constructed translation dictionary. However, the construction of the translation dictionary is expensive, time and computational resource intensive.
The article  B ?” M. H. (B : , 2002.—C. 181-188) (Translated as: Black Cat in a Dark Room or Can we Automate Search of Translation Equivalents in a Parallel Corpus of Texts, M. N. Mihaylov, Philological Compilation, Smolensk, 2002 c. 181-188) discloses a method of finding equivalents of words in parallel texts using co-occurrence of two words in a first language and a second language in known equivalent fragments.
U.S. Pat. No. 9,047,275 discloses a computer-implemented systems and methods align fragments of a first text with corresponding fragments of a second text, which is a translation of the first text. One preferred embodiment preliminarily divides the first and second texts into fragments; generates a hypothesis about the correspondence between the fragments of the first and second texts; performs a lexico-morphological analysis of the fragments using linguistic descriptions; performs a syntactic analysis of the fragments using linguistic descriptions and generates syntactic structures for the fragments; generates semantic structures for the fragments; and estimates the degree of correspondence between the semantic structures.
US2015/0278197 discloses a system and method for creating a comparable corpus by obtaining a set of source documents containing text, constructing language-independent semantic structures for at least one sentence of each of the texts in the source documents; determining universal similarity measures for groups of the source documents by comparing the constructed language-independent semantic structures of the texts in the source documents; identifying sets of similar documents based on the determined universal similarity measures for the groups of the source documents; and creating the comparable corpus based on the identified sets of similar documents.