1. Field of the Invention
The present invention generally relates to a method and apparatus for identifying side information available to statistical machine translation (SMT) systems and, more particularly, identifying side information available to SMT systems within an enterprise setting (or in a social networking site like Facebook®), and using this information to improve SMT systems.
2. Description of the Related Art with Regards to Current Invention
Individual level side information sources are largely based on each user's social/professional network within an enterprise) and personal level information such as age, language skills, etc. Enterprises can consist of traditional companies. In addition, social/professional networking sites (e.g., Facebook, or E-harmony®) can be considered as an enterprise where the users communicate (in a multimodal fashion) within an umbrella of these sites and provide personal information to the sites and all the activities of a user or user-pair are stored in the databases.
Side information includes all the multi-media forms of activity and communication used. For example, a first type of side information may be instant messaging chat history with each individual in a contact list for an activity as well as the people on the contact list. Another example of side information can include e-mail history (exchanged by two or more users) and topic history (between two or more people) relating to the topic between two or more users. Furthermore, side information can include voice-mail (e.g., messages between the two users, messages between multiple users relating to the same topic, meeting transcription/seminars attended by the two users, and shared interest membership to a particular activity between the two users).
Next, group level side information can include groups of users belonging to the same organization/department/interest (e.g., soccer).
There are fundamental differences between Machine Translation (MT) and speech recognition. For example, conventionally, machine translation (MT) has alignments between a source language and a target language. With side information, the alignments can be improved. There is no such thing as source/target language alignment in speech and other natural language processing (NLP) fields.
Translation by its nature is used between two or more people or parties as well as in a one-way scenario. As such, there is parallel or quasi parallel data as side information. At any given time, there are two streams of data on each user side: what the user is typing and what is being translated from the other user.
Conventionally, dialog histories in speech applications are limited to the specific sessions that a user is interacting with the system (e.g., booking a flight). Conventional speech/NLP system would not remember what the user has done a month/year ago. In a chat between two users in an enterprise setting, the dependency (on history information) can easily go beyond days and weeks. The context can still be helpful to translation. Thus, the differences between machine translation and speech recognition are highlighted because the side information for speech recognition are different from side information for machine translation as they solve two different problems.
Conventional MT combines roughly 10+ features to obtain a combined total score at any point in the search (speech has typically fixed weights, LM and AM). Tuning the feature weights for user-pair/topic in translation has a significant impact as compared to tuning language model (LM) weights and acoustic model (AM) weights (they are largely fixed) in speech recognition. Text resources such as instant messaging, e-mails, etc. can only be used to adapt LM in speech recognition and not the AM, whereas these text resources can be used to adapt and/or update both AM LM models as well as other models used in conventional translating.