Pronoun-dropping (or pro-drop) languages such as, for example, Chinese, Arabic and Romance languages such as Italian, are extremely important source languages in today's machine translation markets. A pronoun that acts as the subject of the sentence is explicit in English, but is implicit, or more frequently dropped, in pro-drop languages because it can be inferred from the context by native speakers.
A missing subject in the translation of pro-drop languages into English or into any non-pronoun-dropping languages will seriously hurt readability, and hence reduce the user acceptance and satisfaction for any translation products. In addition, pro-drop phenomena are also frequent in Twitter, Facebook, mobile style short-messages, and other informal style texts.
Existing approaches to the problem of recovering dropped pronouns are based on monolingual techniques, and fall under the broad category of restoring empty categories from the text. Empty categories are placeholders in parse tree that account for irregularities of languages. However, such approaches that rely on monolingual approaches to pronoun recovery suffer from poor performance.