For decades, certain institutions, such as the United Nations, or large international corporations with offices around the world have conducted business in multiple languages. When physical meetings at the UN are convened, delegates in an auditorium in view of other delegates speak in their native languages. Interpreters are present and interpret what is said into the languages of the other delegates.
As a delegate speaks, the interpreter speaks in another language, the interpretation of what was said by the first delegate. To make the meeting as dynamic, interactive and productive as possible, the interpretation is often ideally done “simultaneously.” This means that specially trained individuals listen to a delegate speak in one language, and interpret as rapidly as possible over the flow of words in what is called simultaneous interpretation. The delay of experienced interpreters can be reduced to several seconds or even less, depending on the languages involved.
Delegates in such a meeting are equipped with microphones and headphones to hear interpretations. The interpreters can be isolated in soundproof booths and are also equipped with microphones and headsets. A venue is specially wired and controls are provided to delegates, interpreters and moderators that allow for selection of audio inputs and outputs and control of audio mixing electronics. These semi-manual systems are extremely complex and costly.
In contrast to meeting physically at an auditorium, increasingly, organizations (including companies but also governments, non-profits, various regulatory, rule-making and standards bodies) convene their “meetings” using conference call technology as a way to avoid the time and expense of travel. Delegates or employees can participate from their home locations over a telephone or internet connection.
In the market, there is a desire to promote and conduct multi-lingual meetings via conference call, either replacing or extending the “in-person” meeting with a “virtual” meeting that includes participants connected from remote locations. A traditional conference call allows all participants to enter into a common bridge and hear each other as in a “party line” telephone call. If participants on the bridge speak different languages and cannot understand a common language, communication is quickly made impossible under this “party line” model.
In existing simplified models, in a conference call where participants speak different languages, the use of “consecutive” interpretation is often contemplated. In this mode of operation, an interpreter is included as an additional participant in the conference on the bridge and an agreement is reached to allow the interpreter time to interpret. When, for example, a delegate speaks in Spanish, she pauses after one or two sentences, and the interpreter repeats what she said in English. The delegate then resumes speaking briefly, and the process iterates for the duration of the entire conversation. When a delegate speaks in English, the interpreter waits for the delegate to pause, and then repeats what was said in Spanish. Everybody hears all of the Spanish and English utterances. This mode of interpretation is used, for example, when a sports figure is a guest on a television show and only a handful of questions will be presented to the athlete.
This approach is very slow and tedious, and makes the dialogue much less dynamic. While operating this system in two languages is a significant challenge, it becomes completely unwieldy when multiple languages are involved.
A further complication is that, unlike the in-person meeting at the United Nations, where the participants and interpreters can be positioned so they can see each other, over-the-phone meetings rely almost exclusively on audio cues. Participants need to somehow be able to glean the tone and demeanor of the speaker and to interrupt one another. They must avoid overrunning the interpreter or cutting each other off. This can be virtually impossible.
What is needed is a conference call capability that allows for simultaneous interpretation in two or more languages, without burdening delegates or interpreters with additional language constraints. Participants need to have a sense of context, so that they can yield when someone else wants to talk. The management of the conference must be left primarily to automated systems, so that the participants can focus on the topic of discussion and the interpreter(s) can devote their full concentration to their language duties.
Mr. David P. Frankel, the originator of the current invention, is an expert in the field of audio conferencing. In 2006, Frankel invented a new multi-fidelity conferencing bridge that allows participants to benefit from improved clarity and accuracy of conferencing bridges by developing a way to use Voice-over-Internet Protocol technology (VoIP) or public switched telephone network (PSTN) where both narrow band and wideband technology can peacefully coexist. Users who dial into the bridge with a wideband enabled tool are not forced down into the lower fidelity narrow band. Ultimately, this technology was patented as U.S. Pat. No. 7,986,644 (“Frankel I”). The content of Frankel I is hereby incorporated fully by reference as part of this application.
The next year, in 2007, Frankel improved conferencing bridges by inventing a new identity-based conferencing system where a bridge is capable of recognition of the identity of the individual users, for example to recognize a phone source identification number associated with the user. Through automatic recognition, the burden on the user of the conferencing system is alleviated as the user is able to access the system with less authentication information. This technology was patented as U.S. Pat. No. 7,343,908 (“Frankel II”). The content of Frankel II is hereby incorporated fully by reference.
The same year, in 2007, a different invention described in U.S. Pat. No. 8,041,018 (“Wald”) was filed. Wald describes a conference bridge shown as FIG. 1 taken from the prior art where a main language circle is used as described above where each of the participants (P1 to P7) is connected to the main language circle. As part of this system, all of the interpreters (L1 to L3) must interpret between the main language and one other non-main language.
Wald is extremely limited in that it is rooted in the use of and connection to a main language of all participants where all other subsequent languages connect. This invention is not applicable to complex systems. For example, the United Nations operates with six official languages, and the European Community has twenty-four; Wald simply cannot be used by these users. In these institutions, not all of the meetings take place in all of the official languages but, to be practical, any system deployed in such a multi-lingual environment must be more versatile in that it must be able to accommodate numerous active languages and must provide for various styles of interpretation.
In some cases, UN interpreters work in only one direction; in other cases they work bi-directionally (i.e., they interpret back and forth between two languages). Relay interpretation in these institutions is also part of the standard operating procedure. The requirement that one language be designated as “primary” or “main” or “base” or be used by all of the interpreters is not acceptable. Wald would not be appropriate as there is no willingness to designate certain participants as “second tier” just because they do not speak the “primary language” of the meeting; in fact, it is critical in diplomatic conversations that the different languages all be treated equally. The Wald system requires that the interpreters (L1 to L3) in FIG. 1 all speak a common “main” language. The system fails unless L1, L2 and L3 speak the same language. Further, relay interpretation will not necessarily go through just a single language; there might be relay of Arabic to French to English, but also Chinese to English to Spanish. What is needed is a system of interpretation that can be used by any group or institution that is structured to allow for a wide diversity of use without the need for a ‘main’ language.
Partly to overcome some of the problems and limitations of Wald, in 2011, Frankel invented a new technology for conferencing bridges that allowed for the management of interpretation for users calling into the bridge and speaking different languages. The system as described used a floor control and acted as large dispatcher of the flow of speech between the different users and interpreters connected to the bridge. This technology was patented as U.S. Pat. No. 8,175,244 (“Frankel III”).
The system as shown at FIG. 2, taken from the prior art, namely, from Frankel III, is fully incorporated herein by reference. Frankel describes how a series (1, 2, 3 . . . N) of users also called delegates 10 are connected via a network 30 to a conference server 40 where the different signals are processed. A plurality of interpreters (1, 2, 3 . . . N) 20 are also connected through a network 30 to the same server 40 where data is also processed. FIGS. 3 and 4 taken from Frankel III, show the flow of audio and/or video exchange when delegate 2 (D2) speaks as shown at FIG. 2, and when delegate 4 (D4) speaks as shown at FIG. 3. In these figures, four different bridges 48a, b, c, d are shown each where the conversation evolves in a different language (Mandarin 48a, Japanese 48b, English 48c and French 48d). In this model, there is no “main” bridge or “main” language as described in Wald.
FIGS. 3 and 4 of Frankel III use arrow heads to show how the data, voice and potentially image, are transmitted by the server 40. The Frankel III system shown at FIGS. 2, 3 and 4 allows for consecutive or simultaneous interpretation. This flexible and modular system allows for multiple users, delegates and interpreters, to connect to a server 40 in a wide range of configurations as needed by a client. To help understand the technology, the Microsoft® Corporation could desire to connect in a single conference call the four main design teams of its international units, these are located in Houston (English), Tokyo (Japanese), Taipei (Mandarin) and Montreal (French). The limited technology of Wald forces the organizers to define English as a “main” language and then require the help of three very specific interpreters, a Japanese/English, a Mandarin/English and a French/English interpreter.
Frankel III is much more flexible, as described at FIGS. 2, 3 and 4. First, there is no requirement for any of the four languages to be defined as “main.” Four different bridges would be set up in the system and each would be linked using three different interpreters. Because the system is enabled once all four bridges are linked by any combination of interpreters, the choice of interpreters given to the system owner and operator is made simpler. As shown at FIG. 4, a first set of acceptable interpreters includes: (a) Japanese/Mandarin, (b) English/Japanese and (c) French/English. In fact, other sets of interpreters could also be used, such as: (a) Mandarin/French, (b) English/Japanese and (c) French/English; or even (a) Mandarin/English, (b) Japanese/English and (c) either English/French or Mandarin/Japanese. This is only one of the numerous advantages of Frankel III over the prior art.
While Frankel III is described at FIGS. 2, 3 and 4 and incorporated as part of the disclosure of this invention, it can be further improved. In the example shown at FIG. 4, the words of the Japanese delegate D4 are first interpreted by SI2 into English and then in turn the words are interpreted by SI1 in French. Even with simultaneous interpretation, the delay of the first interpretation is added to the delay of the second interpretation, resulting in a time lag. In a conversational setting as shown at FIG. 4, the delegates listening into the Mandarin and English bridges 48b, 48c will hear a feed delayed by 2-3 seconds while the delegates listening into the French bridge 48d will be delayed by 4-6 seconds. The lag between two different bridges renders active participation difficult. What is needed is a system and method designed to improve Frankel III capable of maintaining the numerous advantages of the system while allowing the participants to improve their overall experience.