The present invention relates to a template for receiving, characterizing and storing data, and in particular for translating data into different languages and/or different media formats according to such a template.
As the market for commerce of consumer goods expands internationally so does the need for advertising in a multiplicity of languages and formats. Today, with the advent of electronic mediated commerce, consumer markets are no longer defined by countries or regions but rather by a specific and targeted sector which can be composed of consumers of many different nationalities. As such, advertisers are forced to provide advertisements in formats and languages suitable to a heterogeneous mix of consumers.
One such international consumer market is emerging with the creation of the European Union. With the advent of unrestricted trade, advertisers are now confronted with the task of producing a multiplicity of printed and electronic ads that suit the specific languages and formats recognized by the different nationalities comprising this new multilingual market. Presently, the production of such ads can be time consuming and expensive and so multilingual ads are limited to big budget advertisement campaigns.
The rapid growth and the world wide acceptability of the Internet and the World Wide Web is responsible for another fast growing international consumer market.
As the Internet grows, many Web sites are becoming accessible internationally with many more computer users accessing these Web sites, seeking information and/or commercial products. However, the increased connectivity between users in different countries has also exposed the problem of communication between such users. Simply providing the communication channel, such as the World Wide Web, is not sufficient to guarantee communication. Users must also be able to understand each other in terms of the human language used for the Web page. Although English is currently the dominant language on the World Wide Web, many different Web browsers are now available for serving Web pages in different languages. However, creating a Web page in many different human languages is currently a difficult and time consuming task when performed manually. Thus, there is a need for automation of the translation of Web pages into different human languages.
One attempt to meet this need for translation has been to create automatic translation software for human languages. Such software receives information from an electronic document such as an advertisement in one human language, and then attempts to automatically translate the document into a different human language. The drawback of such software is that it tries to provide maximum flexibility by receiving any type of language data, which renders automatic translation of the data far more difficult. Human languages are complex, with a good deal of information being understandable only in context and without rigid structural rules. Thus, translations provided by currently available automatic translation software must be examined carefully for errors in the translation which arise from irregularities of human language.
Such automatic translation software would be far more accurate, and would perform more reliably without such a need for careful examination, if the human language data could be provided in a more limited format. Frequently, data presented as advertisement for example on a web page is limited in terms of the vocabulary and subject matter discussed, and as such would be relatively easier to translate if these limitations were recognized by the software. Unfortunately, there is no currently available software which is able to both recognize and to exploit these limitations in order to provide a more accurate translation of the data into a different human language.
In addition, translation of data into different media formats, such as facsimiles, electronic mail (e-mail), voice messages and the like, is also currently difficult to perform automatically. For example, currently the text of an advertisement cannot easily be translated into a voice message which could be provided to a user through the telephone. Similarly, a user cannot submit data through a telephone call to an automated service, and then have this data sent as a facsimile or as an e-mail message. Thus, no software is currently available which can translate data automatically into different media formats.
Such translations into different media formats would be highly useful for disseminating advertisements, for example, in which the type of language data is likely to be highly restricted. For example, currently a user can place an advertisement in a newspaper in a single human language by calling the newspaper and giving the details over the telephone. The advertisement then appears in a single media format, the newspaper. The user cannot easily have the advertisement translated into different languages, nor can the user have the advertisement translated into multiple media formats. Thus, the translation of data into different media formats and into different human languages cannot currently be performed automatically by available software.
There is thus a need for, and it would be useful to have, software for automatic translation of data presented in a fixed format into different languages and into different media formats.
The present invention is of a method for automatically translating data into different human languages and into different media formats. The method of the present invention uses a template for decomposing the data into at least one data element, predetermined according to a human language subject area. Each such subject area has a limited vocabulary and contains a limited number of concepts. The data is then entered, manipulated and stored according to the template. Since the structure of the data is either predetermined or processed according to subject area, the data is relatively easy to translate into different human languages according to such a limited vocabulary. The data is also relatively easy to translate into different media formats, such as facsimile, e-mail and voice messages, for example. Thus, the method of the present invention easily and efficiently translates data into different human languages and into different media formats.
Although the term xe2x80x9ctranslationxe2x80x9d is used herein, it should be understood that the translation could also be performed as a conversion, by storing the information according to generic codes in the database, such as unicode for example, and then by converting the generic code to human language data in the desired human language and media format.
According to the present invention there is provided a method for automatically translating human language data of a subject area according to a template, the steps of the method being performed by a data processor, the method comprising the steps of: (a) subdividing the subject area into at least one data element to at least partially form the template; (b) identifying information in the human language data corresponding to the at least one data element; and (c) translating the information in the at least one data element according to the template to form translated information.
Preferably, the method further includes the step of: (d) storing the information in the at least one data element. More preferably, step (a) further comprises the steps of: (i) determining an associated vocabulary for the at least one data element according to the subject area; and (ii) determining an associated concept base for the template according to the subject area. Most preferably, step (c) is performed by at least translating the information in the at least one data element from a first human language to a second human language according to the vocabulary and the concept base.
Preferably, the concept base determines a role for each word of the vocabulary, such that the word has a limited set of definitions for the template. More preferably, the information is interpreted to be associated with the at least one data element according to the role. Most preferably, the method further includes the step of: (d) displaying the information according to the at least one data element to determine if the association between the information and the at least one data element is correct. Even more preferably, the method further includes the step of: (e) searching the information according to a data type selected from the group consisting of the data element, the role and the concept base.
Preferably, the information is defined as belonging to the at least one data element according to a fixed format for entering the information. Also preferably, the information is stored as a non-word symbol, such that the step of translation includes a step of conversion of the non-word symbol to a word. More preferably, the method further includes the step of: (d) generating an output of the translated information. Most preferably, step (c) is additionally performed by translating the information from a first media format into a second media format according to the template, such that step (d) is performed by displaying the translated information in the second media format. Preferably, the second media format is selected from the group consisting of a Web page, an electronic mail (e-mail) message, a facsimile transmission and a voice message. Preferably, step (c) is performed by at least translating the information from a first human language into a second human language according to the template, such that step (d) is performed by displaying the translated information in the second human language.
According to another embodiment of the present invention, there is provided a method for automatically translating human language data entered by a user to form translated information according to a template, the steps of the method being performed by a data processor, the method comprising the steps of: (a) entering the human language data contained in a subject area by the user according to an entry format; (b) subdividing the subject area into at least one data element to at least partially form the template; (c) identifying information in the human language data corresponding to the at least one data element; and (d) translating the information in the at least one data element according to the template to form the translated information.
Preferably, the entry format is a fixed format, such that the information corresponding to the at least one data element is entered in a fixed location of the entry format, and such that step (c) is performed by identifying the information according to the fixed location of the entry format.
Also preferably, the human language data is entered in the entry format as a type of data selected from the group consisting of: vocal data, printed data and electronic data.
According to another preferred embodiment of the present invention, step (b) further comprises the steps of an (i) determining an associated concept base for the template according to the subject area; and (ii) determining an associated vocabulary for the at least one data element according to the subject area. Preferably, the concept base determines a role for each word of the vocabulary, such that the word has a limited set of definitions for the template.
According to a preferred embodiment of the present invention, the entry format is a free format, such that the information corresponding to the at least one data element is entered in substantially any location of the entry format, and such that step (c) is performed by interpreting the information according to the role. Preferably, the method further includes the step of: (d) displaying the information identified according to the at least one data element to the user for determining if the association between the information and the at least one data element is correct. More preferably, the information is displayed in an output format selected from the group consisting of: vocal data, printed data and electronic data.
Hereinafter, the term xe2x80x9cWeb browserxe2x80x9d refers to any software program which can display text, graphics, or both, from Web pages on World Wide Web sites. Hereinafter, the term xe2x80x9cWeb pagexe2x80x9d refers to any document written in a mark-up language including, but not limited to, HTML (hypertext mark-up language) or VRML (virtual reality modeling language), dynamic HTML, XML (extended mark-up language) or related computer languages thereof, as well as to any collection of such documents reachable through one specific Internet address or at one specific World Wide Web site, or any document obtainable through a particular URL (Universal Resource Locator). Hereinafter, the term xe2x80x9cWeb sitexe2x80x9d refers to at least one Web page, and preferably a plurality of Web pages, virtually connected to form a coherent group.
Hereinafter, the term xe2x80x9ccomputerxe2x80x9d includes, but is not limited to, personal computers (PC) having an operating system such as DOS, Windows(trademark), OS/2(trademark), Linux or BeOS; Macintosh(trademark) computers; computers having JAVA(trademark) (trademark)-OS as the operating system; and graphical workstations such as the computers of Sun Microsystems(trademark) and Silicon Graphics(trademark), and other computers having some version of the UNIX operating system such as AIX(trademark) or SOLARIS(trademark) of Sun Microsystems(trademark); or any other known and available operating system. Hereinafter, the term xe2x80x9cWindows(trademark)xe2x80x9d includes but is not limited to Windows95(trademark), Windows 3.x(trademark) in which xe2x80x9cxxe2x80x9d is an integer such as xe2x80x9c1xe2x80x9d, Windows NT(trademark), Windows98(trademark), Windows CE(trademark) and any upgraded versions of these operating systems by Microsoft Inc. (Seattle, Wash., USA).
Hereinafter, the phrase xe2x80x9cdisplay a Web pagexe2x80x9d includes all actions necessary to render at least a portion of the information on the Web page available to the computer user. As such, the phrase includes, but is not limited to, the static visual display of static graphical information, the audible production of audio information, the animated visual display of animation and the visual display of video stream data.
Hereinafter, the term xe2x80x9cuserxe2x80x9d is the person who operates the Web browser or other GUI interface and navigates through the system of the present invention.
Hereinafter the term xe2x80x9chuman languagexe2x80x9d refers to natural language.
The steps of the method of the present invention could be described as instructions being performed by a data processor, such that the present invention could be implemented as hardware, firmware, software or a combination thereof.