This invention is related to the field of multi-lingual and multi-standard communication between independent computers and, in particular, to automatic conversion between different standards.
One of the fastest growing applications on the Internet is the world-wide web (WWW) The WWW is a collection of networked computers which exchange pages of hyper-text using the TCP/IP protocol. These pages may contain combinations of text, images and sounds, each of which may be either dynamic or static. Hyper-text is also called hyper-media or hyper-links. In addition, these pages may provide various methods of data input, for example, fill-in forms. In the context of the VAE, the pages are also called documents. The computers may be roughly divided into two main classes, clients and servers. The pages are usually downloaded from the servers by a client, using a specialized program called a xe2x80x9cbrowserxe2x80x9d. In some cases, the client enters data onto a page, and transmit this data to a server. This data is usually used to find new pages for the client to download. Alternatively to storing pages on a server, it is becoming a common practice to generate WWW pages on-the-fly at the server, using special programs.
There exist several additional classes of computers, including, search engines, which provide a list of pages on servers which relate to a particular search; proxy servers, which broker communications between clients and servers, for example, by locally storing frequently read pages; and gateways, which connect whole networks to the Internet. A rapidly growing subset of the xe2x80x9cclientxe2x80x9d class of computer is the network computer, which is a specialized computer which is especially designed for connection to the Internet. Included in this sub-class are also Internet telephones and Internet TVs, all of which are not general purpose computers and have their Internet support hard-wired rather than programmed in software.
One of the greatest obstacles to the continued expansion of the Www is the multilingual aspect of the data transmitted, which is compounded by language limitations of users. Currently, most of the pages in the WWW are written in English and most of the browsers and the servers are designed mainly for use with the English language. This situation is equivalent to having a telephone system which can only transmit words in English and a TV system which can only transmit programs in English.
Multi-lingual computer applications are known, for example multi-lingual word processors and even multi-lingual operating systems. However, unlike the Internet, in a computer application the system developer enforces a single standard of language representation and handling. In the Internet, there is no single system developer and it is not possible to enforce a single standard worldwide. Furthermore, there may be multiple standards in a single country. For example, in Japan there are three common character code set encodings for the Japanese language; in Israel, there are several common character sets and three different standards for display and input of textual information. There also exist many variants of the display standards in Israel. It should be appreciated that for many aspects of multi-lingual language support there is no common denominator between the different standards.
The Internet publication xe2x80x9cThe Multilingual World Wide Webxe2x80x9d, written by Gavin Nicol in November 1994, and currently found at the URL: xe2x80x9chttp://www.sil.org:80/sgml/nicolmultwww.htmlxe2x80x9d, describes four main failure modes of multi-lingual computer applications and discusses their relevance to the WWW. The first failure mode is related to data representation, i.e., how textual data is represented and how individual characters are encoded. As noted above, there are three such encoding standards in Japan and several in Israel. Further, the same character code may be used for different glyphs depending on the language and on the character set.
The second failure mode is related to data manipulation, where a given program cannot manipulate multi-lingual data. Some browsers do not support fonts which require more than 8 bits for encoding. Unicode, for example, requires 16 bits. None of the leading browsers are designed to support variable width (in bits) character codes.
The third failure mode is data display. It should be noted that in many languages, such as Arabic, the glyph form of a letter is dependent on the surrounding letters. This requires various display algorithms. In addition, the number of languages and fonts in the world are much greater than the number usually stored in a client computer, especially if it is a specialized network computer. Also, when using some browsers it is not possible to simultaneously display more than one language at a time (in addition to English).
The fourth failure mode is related to data input. One issue is keyboard mapping assuming that a browser supports the font of the language used by the server, how should the browser map keystrokes to the individual glyphs. Many languages, such as Russian, require more than the standard 26 letters of English. Another issue is support for bi-directional data input. Some languages, for example, Hebrew and Arabic, are written from right to left (RTL) rather than from left to right (LTR), as English is. Other, oriental, languages are written in a vertical orientation.
There are several problems unique to bi-directional languages. Even when the language is written RTL, numbers are (usually, but not in all xe2x80x9cstandardsxe2x80x9d) written LTR. In addition, the text may be stored in a xe2x80x9clogicalxe2x80x9d manner, where the first stored letter is usually the rightmost letter. Alternatively, the text may be stored in a xe2x80x9cvisualxe2x80x9d manner, where the first stored letter is the leftmost letter, which in a multi-line text is located in the middle of the text. Thus, visually stored data is displayed LTR (with an appropriate font), while logically stored data must be displayed on a letter-by-letter basisxe2x80x94LTR letters displayed one way and RTL letters displayed in another way. It is a common practice to mix visual and logical representations in a single WWW page. This is particularly true for input. The input is most conveniently made using a logical representation, even though the data may be stored using a visual representation.
These above problems are compounded when viewed in the context of the WWW. One example of such a problem relates to search engines. Search engines automatically assimilate the contents of many WWW pages and allow a client to search these pages using various methods. If a page is stored using a visual representation, a search using keywords entered using a logical representation will not find the page. Of course, if the character set encoding is different, the page will not be found either. Another example, also relating to search engines arises in languages where there is more than one legal way to spell a word. This is common in various dialects of English, but in Thai, there is a lexical equivalence between various orderings of certain three-letter groups. Since search engines are inherently global, enforcing a single standard is practically impossible.
Another example of a compound problem is the use of multiple standards and/or languages in a single WWW page. Another compound problem is translating between units of measurements and ways of writing dates and times. For example, xe2x80x9c1/6/1999xe2x80x9d represents Jan. 6, 1999 in the US and Jun. 1, 1999 in Europe.
To make matters worse, even the standard language of the WWW pages, HTML (Hyper-Text Meta Language) is not uniform around the world.
As a direct result of these problems, the xe2x80x9cglobal villagexe2x80x9d has not yet arrived. One pointed example can be seen in Israel. At the time of this writing, Israel is one of the world industrial leaders in most Internet applications. However, the penetration of the Internet into the public sector is substantially retarded as compared to the US, even though a higher percentage of households in Israel own a computer with a modem than in the US.
An obvious solution would be to adapt the clients and servers in the Internet so that they support multiple languages. In particular, automatic WWW page generators will also have to be modified. In addition, such adaptation will probably require modifications to development environments. The amount of work required for this type of adaptation is enormous, since every existing browsing software and/or hardware would have to be adapted, a single standard would have to be enforced and all new applications would be limited by having to support a great number of languages and standards. This would be contrary to the concept of network computers: providing only the minimal hardware and software for surfing the WWW. For this reason, among others, most xe2x80x9cmulti-lingualxe2x80x9d solutions support only one language in addition to English. In many cases, the languages supported are not the two which are desired.
In an attempt to solve the problem of multi-lingual searching, a web site has been constructed in which a client enters search terms in one language (Hebrew) and the search engine translates the words to English and applies the translated words to one of a limited number of existing search engines. The input is entered using Latin characters, which the web site maps to Hebrew characters after the input process is finished.
In yet another attempt, a web site has been created in which a JavaScript code segment is included in a WWW page, which displays a virtual keyboard in the desired language and which allows a user to click on keys. Each click adds a letter to a text object. The input from the user is directed only to the web site and for use of the programs therein and does not allow communication with other web sites.
Several solutions for the problem of display of multi-lingual pages have been suggested and/or tried. The Microsoft Internet Explorer version 3.01, Hebrew version, uses meta-tags in the WWW page to indicate whether a text object uses visual encoding or logical encoding. This information is used to drive display algorithms for the text object.
In the above referenced WWW publication and in xe2x80x9cSummary of K12 activities in Japanxe2x80x9d, by Kunio Goto and Masaya Nakayama, URL xe2x80x9chttp://k12.jain.ad.jp/inet95.htmlxe2x80x9d, a conversion server is suggested for use in Japan. The server is suggested for use as a proxy server and it replaces character codes from one standard set with codes from another set. This replacement is on a letter by letter basis.
In one system, xe2x80x9cInternet with an Accentxe2x80x9d, published by Accent Software Ltd., Israel, multilingual pages are developed using a special development environment provided with the package. The pages are then stored in a special format. The client must either be provided with a special browser or with a plug-in to his existing browser. This package has the capability of automatically displaying pages in one of several languages based on the setting at the client. However, this package only works if both the client and the developer use the xe2x80x9cAccentxe2x80x9d package.
The common denominator to all of the above solutions is that they require changes to at least one, and usually at least two, of the client, the server and/or the development environment. As a direct result, the accessibility of advanced and newly developed features (for non-multi-lingual applications) is retarded. In addition, the above solutions are not easily portable to newly developed systems.
It is an object of some embodiments of the present invention to allow data to be exchanged between substantially any client and substantially any server in substantially any language or standard, without requiring any changes to be made in the server, client or even in a development environment in which the data is generated. Preferably, the server and the client communicate using a WWW protocol.
It is a particular aspect of some embodiments of the present invention to provide a solution to the input of multi-lingual information.
An automatic converter, in accordance with preferred embodiment of the invention is integrated into a client-server relationship as a (hidden) proxy. When the client downloads information from the server, the converter converts the information to a standard usable by the client. When the client enters input data to the server, the converter converts the input data to a standard usable by the server.
Thus, instead of the multi-lingual support increasing the complexity of the computer software on the client, the support is provided by the network itself, i.e., xe2x80x9cthe network is the computerxe2x80x9d.
In a preferred embodiment of the invention, the converter automatically determines the standard used by the server for a WWW page. Preferably, the converter automatically detects the language of at least a portion of the page.
Alternatively or additionally, the converter automatically detects the standard used by the client. Preferably the converter automatically detects the language used by the client. Alternatively, the client sends information to the converter regarding the client""s capabilities. In one preferred embodiment of the invention, the converter queries the client regarding the client""s capabilities. The client may respond automatically, or a user at the client may respond instead.
In a preferred embodiment of the invention, the client is provided with a manual override for the standard used by the server and/or the client. This feature is especially useful if the automatic standard detection does not properly detect the standard. Preferably, even if the automatic converter cannot pinpoint the precise standard used by the client and/or the server, the converter does attempt to narrow the possibilities. It should be appreciated however that automatic detection of standards may be adversely affected by the existence of mistakes in the WWW page, such as spelling mistakes.
It is a particular aspect of some embodiments of the present invention to provide a seamless connection between a client and a server, in which all data from the server is converted into data usable by the client and all data from the client is converted into data usable by the server.
In a preferred embodiment of the invention, data from the server which cannot normally be displayed on the client is converted, by the automatic converter, into image files for display on the client. Preferably, text data for which there is no available font on the client is converted in image data, for example GIF format data. Preferably, the text data is converted into a plurality of images. In a preferred embodiment of the invention, small groups of words are converted into a single GIF file, such that resizing of an object containing the text data is facilitated. Preferably, the number of words in a group is inversely related to the font size. Alternatively each group consists of a single word, to enhance caching.
In a preferred embodiment of the invention, information relating to the content and/or the format of the converted text is encoded into the name of the image file. Thus, the name of an image file may include an indication that the word is xe2x80x9cthexe2x80x9d and that it is underlined. Encoding the information in this manner increases the efficiency of cache systems.
Another aspect of some preferred embodiments of the invention relates to replacing input objects, which are not supported by the client, with custom made Java applets. In these embodiments, the converter replaces the definitions of input objects, in the pages sent by the server, with calls to Java applets. Preferably, the automatic converter parses the pages to determine the input objects and replaces the input objects which have no support at the client with Java applets. When the page is displayed by the client, the client is provided with a xe2x80x9cnewxe2x80x9d input object, which supports the standards and/or language. This is most convenient for the client and which does not require support by the client""s browser. Preferably, these Java applets are cached at the client for future pages, so that the applets need not be download anew with each page. Optionally the new input object is compatible with the server""s standards. Alternatively, the converter converts the data entered using the input object to a standard supported by the server.
In a preferred embodiment of the invention, the Java applets which are used for input in a particular language render individual keystrokes as character glyphs even if the particular font required by the language is not supported by the client. Preferably the applets also provide other services, such as bidirectional input, letter fusion and even spell checking.
Another aspect of the present invention relates to controlling the viewing of copyrighted information. In a preferred embodiment of the invention, copyrighted information is provided through a conversion server, which server encodes the information so that it is difficult to copy using data manipulation programs, yet easy to assimilate using human senses, once displayed. Preferably, the display process is also protected so copying the displayed information is also difficult.
Typically, consumers must either pay to view copyrighted information or they are forced to view advertisement information along with the copyrighted information. However, once data is available in a computer readable format, there is a danger that an infringer will copy the data, remove any advertisements and redistribute the data himself, for his own enjoyment and/or profit.
Thus, in a preferred embodiment of the invention, an automatic converter brokers information between a client and an information provider, while providing and presenting the information to the client in a form which is not easily copied.
In a preferred embodiment of the invention, the client is provided with a program, preferably a Java applet, which temporally modulates the information, so that only small parts of the information are displayed at any instant. Thus, even though a human can integrate the displayed information, a snapshot of the displayed data will contain only partial amounts thereof. Some examples of temporal modulation include, displaying the data in a running strip and intermixing advertisements with the copyrighted information. Thus, after viewing the data, the client will not have in his possession a file containing only and all the displayed information.
In a preferred embodiment of the invention, client programs are authenticated. Preferably the authentication uses a key-code system in which the server sends a key to the client and the client is expected to respond with a code which is a (secret) function of the key. Preferably, a different key-code combination is used in each communication by the server. Preferably, the transmitted data is encrypted, to reduce the possibility of it being intercepted by a copyright pirate.
There is therefore provided in accordance with a preferred embodiment of the invention, a method for transferring information between a server and a client, through a converter, comprising:
analyzing at least a portion the information by said converter, to determine a standard used by said server to encode the information in the portion; and
replacing at least a portion of the analyzed information with other information, which other information uses a second standard,
wherein, analyzing comprises parsing the information on a syntactic level and wherein said information comprises at least one Internet hypertext document. Preferably, the standard comprises a language. Alternatively or additionally, the standard comprises a standard for an RTL language. Alternatively or additionally, replacing comprises replacing only a portion of the analyzed information.
There is also provided in accordance with a preferred embodiment of the invention, a method for transferring information between a server and a client, through a converter, which information includes at least one input object, comprising receiving said information by said converter from said server; replacing said input object with another input object; and transmitting the information after said replacing, wherein said information comprises at least one Internet hypertext document.
Preferably, the input object is a text object. Alternatively or additionally, said another input object is of a type supported by said client. Alternatively or additionally, said input object is of a type not supported by said client. Alternatively, said input object is of a type supported by said client.
In a preferred embodiment of the invention, replacing comprises replacing said input object responsive to a known difference in standards between said client and said server. Preferably, said another input object is not included in a toolkit portion of said client. Alternatively or additionally, said another input object is a call to a program. Preferably, said program is a Java applet.
In a preferred embodiment of the invention, the method comprises replacing a second input object with a Java applet, wherein said second input object is supported by the client.
There is also provided in accordance with a preferred embodiment of the invention, a method for transferring information between a server and a client, through a converter, comprising analyzing at least a portion of the information by said converter, to determine a standard used by said server to encode the information in the portion; and determining at least one portion of said information not supported by said client, wherein said information comprises at least one Internet hypertext document.
Preferably, the method comprises replacing said at least one portion with a portion supported by the client. Alternatively or additionally, said standard comprises a language. Alternatively or additionally, said standard comprises a standard for an RTL language.
There is also provided in accordance with a further preferred embodiment of the present invention, a method for transferring information between a server and a client, through a converter, comprising:
selecting an output portion of said information, which information is designated for the client and comprises at least one Internet hypertext document, which portion has a particular appearance when displayed by a client compatible with output portion; and
replacing the output portion with other data, having a similar outward appearance as the output portion, when the other data is displayed by the client for which the information is designated.
Preferably, said other data is image data. Alternatively or additionally, said output portion is textual data. Alternatively or additionally, selecting an output portion comprises selecting an output portion not compatible with said client. Alternatively or additionally, said other data comprises a reference to a data file and wherein said reference encodes at least a portion of the content of said output portion. Alternatively or additionally, said other data is generated on-the-fly. Preferably, said other data is generated at the converter.
There is also provided in accordance with a preferred embodiment of the present invention, a method for transferring information between a server and a client, through a converter, each of said client and said server using different standards to encode said information, comprising:
receiving data from said client;
changing said received data from a known standard of the client to a known standard of the server; and
transmitting said changed data to said server, wherein said data and said changed data comprise at least one Internet hypertext document.
Preferably, said standards differ in language. Alternatively or additionally, said standards differ in logical/visual representation of an RTL language. Alternatively or additionally, said standards differ in character set encoding. Alternatively or additionally, said standards differ in character bit width.
There is also provided in accordance with a preferred embodiment of the invention a method for transferring information between a server and a client, through a converter, comprising:
detecting a text portion of said information by said converter, which portion comprises ordered characters having a meaning in a first representation and which information comprises at least one Internet hypertext document; and
changing the order of at least some of said characters, such that the text portion has the same meaning in a second representation.
Preferably, changing the order comprises changing the order characters responsive to a known display method used by said client.
There is also provided in accordance with a preferred embodiment of the invention, apparatus for brokering the transmission of information between a server and a client, comprising:
a connection to said client;
a connection to said server; and
a converter which receives an Internet hypertext document from the server through the connection to the server, adds a control to the document and transmits the document to the client through the connection to the client,
wherein said control is operable to allow a user to enter configuration information for said converter.
Preferably, said control is operable to download a data entry form from said converter.
There is also provided in accordance with a preferred embodiment of the invention, apparatus for brokering the transmission of information between a server and a client, comprising:
a connection to said client;
a connection to said server; and
a converter which receives an Internet hypertext document from the server through the connection to the server, changes an object in the document and transmits the changed document to the client through the connection to the client.
Preferably, said converter adds an object to the document. Alternatively or additionally, said converter removes an object from the document. Alternatively or additionally, said converter replaces an object on the page with another object. Alternatively or additionally, said object comprises an object to be displayed by said client. Alternatively, said object comprises an object which accepts input at said client.
In a preferred embodiment of the present invention, the apparatus comprises a server. Preferably, the server and the converter are comprised in a single computer. Alternatively or additionally, the server and the converter operate as a single process.
There is also provided in accordance with a preferred embodiment of the invention, a method for controlling the viewing of copyrighted information, transmitted from a data source to a client, on the Internet, comprising:
transmitting the information from the data source to a server, wherein said information is in a format viewable by the client;
converting the information, at the server, to an encoded form;
transmitting the encoded form of the information to the client; and
decoding and displaying, at the client, of the encoded information, wherein said encoding and decoding makes said information less available to copying by said client.
Preferably, the format of said information is a format used on the Internet. Preferably, said format is a HTML format.
Alternatively or additionally, displaying comprises temporally modulating the display of the information.
Alternatively or additionally, decoding comprises decoding by a server-provided program. Preferably, the server-provided program requires a live connection with said server. Alternatively or additionally, the server-provided program is downloaded from the server. Alternatively or additionally, the method includes authenticating the server-provided program to the server. Alternatively or additionally, converting comprises converting said information to a form unusable by said client without said server-provided program.
In a preferred embodiment of the invention, converting comprises encrypting. Alternatively or additionally, converting the information comprises converting only a portion of the information.