1. Technical Field
The present invention relates to the field information processing in a distributed network. More particularly, the present invention relates to the transcoding of web based content and enterprise content from one format to another. Still more particularly, the present invention relates to a system and method for increasing the functionality of transcoders by the inclusion of semantic features.
2. Description of Related Art
Transcoding is the process of transforming the format and representation of content. Computers, of course, are not inherently intelligent; they need to be told exactly what things are, how they are related, and how to deal with them. Clearly, a simple way for computers to communicate and readily exchange data is needed. Transcoding is a key element in this exchange. What all of this means is, enterprise data and applications, or web-based data, can be leveraged by multiple users with multiple devices in a manner that is seamless across the network, and in a manner that is tailored to the specific user and device. The content, therefore, may be delivered in a wide range of networks.
There is a vast repository of data available on the Web. There is an even larger repository of business information, such as enterprise data, available on legacy systems. There is a substantial opportunity to extend the use of enterprise data by transcoding the formats, that is, bringing them out from behind legacy protocols. As enterprises expand into new e-business markets, and as their workforces become more mobile and widespread, easy access to legacy data becomes even more critical.
There are three main types of transcoding:
Data or syntactical transcoding is the conversion of data from one format to another, typically from a non web-friendly format to a friendly one (e.g., Advanced Function Presentation (AFP) to Scalable Vector Graphics (SVG), extensible markup language (XML) to hypertext markup language (HTML), etc.). Data transcoding can be used to take advantage of new web formats, such as SVG and extensible HTML (XHTML). Other examples of syntactical transcoding include converting the format of portable document format (PDF) documents to HTML or XML. For example, many portal web sites provide a map service for users to locate specific addresses across the United States (e.g., Yahoo, a product of Yahoo!, Inc., 3420 Central Expressway, Santa Clara, Calif. 95051). Today, the original vector-based map data is converted to a graphics interchange format (GIF) file before being sent to the client browser. Once SVG becomes a standard format supported by browsers, a transcoder can be used to convert the map data into SVG on the fly, preserving the quality and flexibility of the original data. Users will then be able to manipulate that vector graphics data locally on the client without having to access the server again. Data transcoding can also be used to aggregate content for presentation to the user in a convenient and accessible manner. Again, portals such as Yahoo are a good example. They allow their customers to configure their own home pages to provide a wide range of tailored information, from news and weather to favorite links and email status. The formats of the original data from this wide variety of sources must be converted to a browser-friendly format before being sent to the client. Reformatting of content is necessary in order to achieve universal access because devices utilize different markup languages to render content. For example, many wireless phones use the Wireless Markup Language (WML) to render content instead of HTML. Furthermore, the Palm Pilot uses Compressed Markup Language (CML), its own variant of HTML.
Device transcoding is the conversion of web formatted data (e.g., HTML) to one more suitable for displaying on certain devices, typically the filtering of data to be displayed on less capable clients in the Pervasive Computing (PvC) space. There are many reasons why content must be filtered, transformed, or reformatted to enable it to be universally accessed by devices. Filtering is typically required whenever resource constraints prohibit the storing or timely transmission of content. Because devices differ in the amount of memory that is accessible, they vary greatly with regard to how much content they are capable of storing. As a result, it may be necessary to filter out some content, such as large GIF and JPEG (joint photographic experts group) images, for devices with limited memory. Similarly, for devices connected by narrow bandwidth channels (e.g., 8 kbps), it may not be possible to deliver content that contains a large number of images in a timely fashion. Transformation is typically required to achieve universal access because many devices are only able to render a limited number of content representations. For example, the PalmPilot, available from 3COM Corporation, 5400 Bayfront Plaza Santa Clara, Calif. 95052-8145, is only capable of rendering images of Palm bitmap. Therefore, content that exists in GIF or JPEG must be transformed to Palm bitmap to be rendered. Other examples of transformation include the scaling of an image to enable it to completely fit on a display, and the rendering of text as synthesized speech for voice-driven car browsers.
Protocol transcoding is the transcoding of data which is typically broadcast using a non-HTTP protocol (e.g., 3270 (a class of terminals known as Display Devices, normally used to talk to IBM mainframes), 5150, etc.) into Hypertext transfer protocol (HTTP) format, to be displayed using a normal browser. Examples include the implementation of a 3270 terminal, a customer information control system (CICS, an IBM communications system now used for database handling) client, etc. (3270, 5150 and CICS are all registered trademarks of the IBM Corporation). This requires transcoding in both directions, as the user input needs to be converted back into the original protocol. Where bandwidth is an issue, for example, in a slow wireless connection, it may be necessary to transmit the web content in a protocol with suitable compression. The content would have to be transcoded twice, once at each end of the wireless connection.
FIG. 1 depicts a prior art relationship between transcoding proxy 100, client 120 and web server 110. Initially, client 120, including an Internet browser, makes a request for a uniform resource locator (URL). Transcoding proxy 100 passes the request to web server 120 and intercepts the returned request results intended for client 120. Transcoding proxy 100 converts the requested documents to a form compatible with client 120 prior to returning the requested documents to client 120. In the depicted figure, transcoding proxy 100 includes syntactical transcoder 102, which further includes a plurality of transcoder functions A1 to AN, for converting individual data formats from one format to another. Also included in transcoding proxy 100 is device transcoder 104, which further includes a plurality of transcoder functions C1 to CN, for converting web formatted data to a format more suitable for displaying on certain devices. Finally, included in transcoding proxy 100 is protocol transcoder 106, which further includes a plurality of transcoder functions B1 to BN, for converting data which is typically broadcast using a non-HTTP protocol into HTTP format, to be displayed using a normal browser. Syntactical transcoder 102, device transcoder 104, and protocol transcoder 106 select the proper transcoder function by comparing syntactical, device, and protocol preferences supplied by client 120 with the returned requested document from web server 110. Transcoding proxy 100 reformats the requested document according to the preference data supplied by client 120 and transfers the document to client 120.
It is often the case that a single transformation for the data is not enough. For example, a portal transcoding application which has to supply a customized home page to a user who owns several web accessing devices, including a desktop computer and a palm computer, would first use data transcoding to convert the original source data into a presentation neutral format. Then, based on the type of device the user requested the information from, a second transformation, this time device transcoding, would be made to tailor the web content to be compatible with the target device. XML is an important technology for this type of transcoding. XML grammars are presentation neutral, which makes them excellent candidates for intermediate formats between legacy data and the final presentation form.
FIG. 2 depicts a prior art configuration between transcoding proxies 200, 210 and 220, client 240, and web server 250, for providing multiple transformations in order to properly format a document for a client. Transcoding proxies 200, 210 and 220 are identical in functionality to transcoding proxy 100 and, therefore, will not be discussed in detail again. In the depicted example, client 240 makes a request for a uniform resource locator (URL). Transcoding proxy 200 passes the request to web server 250 and intercepts the results for client 240. Transcoding proxy 200 attempts to convert the requested document to a form compatible with client 240.
However, in the depicted example, as syntactical transcoder 202, device transcoder 206, and protocol transcoder 204 attempt to select the proper transcoder function, it is determined that transcoding proxy 200 does not possess the proper transcoder function to effect a complete transformation. In comparing the client preferences to the format of the returned requested document, a necessary transcoder function is missing. Therefore, rather than returning an improperly formatted document to client 240, transcoding proxy 200 passes the document to one of transcoding proxies 210 and 220 for intermediate processing. In so doing, transcoding proxy 200 can return a properly formatted document to client 240.
Transcoding occurs after the source data has been accessed (from a database, file server, etc.) but before the end user is able to access it. Precisely where the transcoding takes place depends upon the specific transcoding application. For example, the transcoding of financial data is likely to take place on or near the origin server for security reasons (i.e., inside the financial company's firewall). Device transcoding is more likely to take place as a last step before delivery to the user's web browser. In some cases, it may even be desirable to transcode data on the target client machine.
The term ‘reverse proxying’ refers to a setup where the proxy server is run in such a way that it appears to clients like a normal web server. That is, clients connect to it considering it to be the destination origin server and do not know that requests may be relayed further to another server, even through other proxy servers.
The word ‘reverse’ in reverse proxy refers to the inverted role of the proxy server. In the regular (i.e., forward) proxy scenario, the proxy server acts as a proxy for the client. The request is made on behalf of the client by the proxy server. However, in the reverse proxy scenario, the reverse proxy server acts as a proxy for the server. The proxy services requests on behalf of the server. While this may seem to be the same concept, merely expressed in two ways, the distinction becomes clear when considering the relationship of the proxy server to its client and origin server.
A forward proxy server, or a set of them, acts as a proxy to one or more clients. From the client's perspective, the proxy server is dedicated to servicing that client's needs, and all requests may be forwarded to the proxy server. A given client will use the same proxy server over a period of time, and the proxy configuration is dependent on the site where the client is running. Forward proxy servers are usually run by the client organization or an internet service provider. Forward proxy servers are fairly close to the client. Conversely, a reverse proxy server represents one or a few origin servers. Typically, random servers cannot be accessed through a reverse proxy server. Only a predetermined set of files—those available from the origin server(s) that the reverse proxy is serving—are available from the reverse proxy server. A reverse proxy server is a designated proxy server for those specific servers. Furthermore, the designated proxy server is used by all clients for access to the specific site of the server being serviced. A reverse proxy server is usually run by the same organization that runs the main origin server for which the proxy is a reverse proxy.
The reverse proxy concept is useful in terms of transcoders because a transcoding proxy server most often works as a reverse proxy server. Therefore, transcoding servers dedicated to the dissemination of enterprise information over a network would take responsibility for converting the requested documents into a format compatible with the network.
Prior art transcoding servers can also transcode data from many origin servers, both within and outside the enterprise. Prior art transcoding proxies have heretofore not integrated semantic information with syntactic format transcoding. While separate methods for natural language translation and syntactic transcoding exist, the broad area of semantic transcoding, particularly with the semantic information, has been overlooked by prior art.