With the development of computerized information source such as remote networks, users of data processing system are able to connect to other servers and networks so as to retrieve massive electronic information which can not be accessed in traditional electronic media. This kind of information media is increasingly substituting traditional one for information distribution such as newspaper and even television.
In communication, different computer networks are connected with each other through “gateway” which handles the data transmission and message transformation from sending network to receiving network. Gateway is a device used to connect different networks (that use different communication protocols) so that information can be transmitted from one network to another. Gateway also transforms the format of information so that it is compatible with the protocol adopted by the other network.
Internet is one of the remote networks generally used today. Internet originates from ARPAnet of the United States and now is the largest open computer network consisting of numerous networks connected with each other. Internet uses TCP/IP protocol that is the abbreviation for “Transport Control Protocol/Internet Protocol”. The software protocol was developed by Defense Department of the United States for communication between computers. Internet can be described as the distributed computer network consisting of computers that connect with each other and run network protocols that enable users to share network information. Because of the requirement to share information in general use, remote networks such as Internet are developed into “open” network. For open system, developers can design software application for specification operation or service almost without any restriction. For details about node, object and link of Internet, readers can refer to textbook “Mastering the Internet” by G. H. Cady et al. and published by Sybex Company, Alameda, Calif., USA.
The electronic information exchanged among computers is usually formatted in hypertext. Hypertext is a prior technology that couples text, image, audio and action into a complex, nonlinear network and permit users to “browse” or “surf” by relevant title. The term “hypertext” is created in 1960s to refer to the computer files which show distinct nonlinear structure from linear structure in book, film and language.
On the other hand, the term “hypermedia” emerging recently is a synonym of “hypertext” but emphasizing on the non-text constituents of hypertext such as, animation, audio, video and etc. Hypermedia image, audio, video or their arbitrary combination can be synthesized into information storage and retrieval system. The hypermedia and hypertext, especially the interactive ones selectable and controllable by the users, are constructed in an environment, where the work and study are parallel to human thoughts. That is, they permit the users to establish correlation between the topics instead of moving sequentially from one motion to the next one. The hypermedia and hypertext are linked in such a way that the users can jump from one topic to another relevant one during search period. Hyper links are embedded in hypermedia or hypertext files and lead to other hypermedia or hypertext files when “clicked” by the reader.
World Wide Web (WWW), or “Web”, is the multimedia information retrieval system on Internet. Web is the most common method of transferring data in network. Other methods include FTP (File Transfer Protocol) and Gopher, etc., but they are not so common as web. On web, clients access the service provided by web server using Hypertext Transfer Protocol (HTTP), where HTTP is a well-known application protocol, which permits the users to use a standard page description language called Hypertext Markup Language (HTML) to access various kinds of files (e.g. text files, graphics, images, audios, videos, etc). HTML provides underlying file formats, and allows developers to specify the links to other servers and files. Link that denotes the network path to other web server is specified in Universal Resource Locator (URL). There is dedicated syntax for specifying network path by URL.
URL is typically formatted as: http://somehost/someprogram?parameters . . . in which “somehost” is the host address of the URL, “somedirectory” is the directory name where the web page indicated by the URL can be found. The usual way to decompose an URL into an actual address of a Web server is by using a domain server. In the internet or intranet, a domain name server converts the host name into an actual Web address. One example of the domain name servers is the Domain Name Service (DNS) in Internet. The procedure, in which a Web user requests a host name and an address from the domain name server, is called parsing. In the TCP/IP, the domain name server parses the host name and makes one or more IP address lists, which are returned to the Web clients, who requested on the HTTP. Each IP address specifies a server, which is used to process the request contents sent by the browser.
WWW which employs hypertext protocol follows a client/server architecture. At the client side is the browser software that sends requests to web server and then explains, displays or plays the hypertext information and other multimedia files returned by a web server.
While thousands upon thousands of companies, universities, government offices, museums and municipal authorities publish their homepages on the web, Internet grows to be a very valuable information source. Even a beginner is able to browse thousands of web pages or news groups after a little practice. Hence the Internet accesses and the corresponding markets are growing rapidly.
Today World Wide Web holds huge amount of information which is organized by hypertext pages. The number of pages is so large that it is impossible for people to follow site by site, link by link even within one site like IBM's, let alone the whole web. Thus it is very important for users to efficiently, rapidly and conveniently find the interested information on world wide web.
Currently, there are many kinds of information retrieval facilities. One of them is a directory service. In a directory service, web sites/pages are classified into categories and each category is further classified into sub-categories. Further more, categories are organized hierarchically and presented as web pages with links to sub-categories. By exploring layer by layer, finally the users may discover a set of relevant web sites/pages eventually. With the amount of web sites/pages increasing explosively, directories grow to huge sizes. For example, Yahoo! currently has 25,000 categories that hold over half a million sites.
Another kind of information retrieval facility is a search engine. A search engine indexes web sites/pages by using key words. A user inputs query words, that are key words acceptable by the search engine, and then he gets related web sites/pages. More and more people would rather query search engines for web sites/pages by using key words than exploring layer by layer in the directories. Category entries may also be indexed for searching. For example, in Yahoo! both category entries and web sites are presented as search results.
Computer application software such as free-of-charge or low cost Internet search engines let people easily get to the Web sites by querying so as to obtain corresponding topic information from those sites. When users are surfing on the Internet, directories and search engines help them so much that they are recognized as “portal sites” to the Internet. Famous directories/search engines include Yahoo!, AltaVista, Exite, etc. Sina (www.sina.com.cn) is a very famous Chinese portal site. However, most of web pages are written in English and also cataloged/indexed in English. Users who can not read English can not enjoy the convenience.
To solve this problem, there are some web page translation software that can statically or dynamically translate web pages from one language into another, such as from English into Chinese, or from German into English, etc. Some software also reserves HTML tag structure such that users can follow links marked in their native language without difficulty. Moreover, users can also utilize the directories since the names of categories are also dynamically translated.
Even so, Internet users still can not query and retrieve in their native language different from that search engines used in indexing. The reason is that the translation of the query words from the native language into other languages that search engines are using is lacking, even if web page translation software does exist.
Some query translation prototypes/solutions partly solve the problem. They provide predefined forms to collect input from the users and then translate the user input into a language, that the search engines can accept, and construct query commands, which are later sent to particular search engines. However, they have the following drawbacks:
Firstly, many Internet search engines support a combinatorial query and/or an second query to narrow down the scope. In the first case, the word string to be matched as a search condition is logically combined with other search conditions such as categories or date of publication to be retrieved. In the second case, the second search is done by using the result returned by the first search. However, these functions are lost when users use these query translation software.
Secondly, ambiguity of translation cannot be properly handled. When query words are translated from one language into another, the mapping is often one-to-many. The problem is further complicated by the presence of synonym (a word with multiple meaning) in many languages especially in Chinese. For example, the Chinese word ÿ″ (PINYIN: _qing1) has four meanings and each meaning may correspond to several English translations. The first meaning can be translated into English words “cavity” and “chamber”. The second meaning can be translated into English word “speech”. The third meaning can be translated into English words “tune” and “pitch”. The fourth meaning can be translated into English words “accent” and “tone”. There are many such kind of synonyms in Chinese.
The query translation server for Internet search engines which is described by this invention can eliminate the above drawbacks.