The Internet, and in particular the World-Wide Web (“WWW”), are a large collection of computers operated under a client-server computer network model. In a client-server computer network, a client computer requests information from a server computer. In response to the request, the server computer provides the requested information to the client computer. Client computers are typically operated by individuals. Server computers are typically operated by large information providers, such as commercial organizations, government entities and universities.
To ensure the interoperability of the potentially different computers and computer operating systems in a client-server computer network, various protocols are observed. For example, the Hypertext Transport Protocol (“HTTP”) is used for transporting hypertext files over the Internet. In addition, the WWW observes a number of protocols for organizing and presenting information, such as the Hypertext Markup Language (“HTML”) protocol and the Extended Markup Language (“XML”) protocol.
The HTTP protocol, in particular, supports a feature known as “dynamically-generated customized pages.” A dynamically generated customized page comprises a set of information in a particular format. The same set of information can be presented in various ways, depending upon whether a particular format is desired, and supported, by the requesting client computer. For example, a first client computer may support the ability to present information in columns, while a second client computer may instead support the ability to present information in the form of a table. As a further example, the first client computer may be operated by a user in a Spanish speaking locale, while the second computer is operated by a user in an English speaking locale. A server computer receiving an information request from the first client computer may respond with a dynamically generated page presenting the requested information in a column format and in the Spanish language, while responding to a request from the second client computer with a dynamically generated customized page that presents the requested information in English and in the form of a table. Thus, two different customized pages can be created to represent the same information.
Computer executable instructions are used to dynamically generate customized pages. U.S. Pat. No. 5,740,430, entitled “Method and Apparatus for Server-Independent Caching of Dynamically-Generated Customized Pages,” issued on Apr. 14, 1998, to Rosenberg, et al. (the “Caching Application”), discloses a method and apparatus to efficiently respond to a large number of requests for customized pages. In particular, the Caching Application discloses a method and apparatus for operating a client-server computer network such that a server computer dynamically generates and then stores customized pages requested from a client computer. Subsequent requests for previously generated customized pages are retrieved from a cache in the server computer. Since previously generated customized pages need not be regenerated, computational overhead is reduced. The Caching Application is hereby incorporated by reference in its entirety.
Further, U.S. patent application Ser. No. 09/965,914, entitled “Method and System for Cache Management of Dynamically-Generated Content,” filed on Sep. 28, 2001 (the “Cache Management Application”), discloses a method and system for regeneration and file management of previously cached dynamically-generated content, such as that disclosed in the Caching Application. The Cache Management Application discloses a method and system for efficiently managing the caching and delivery of dynamically-generated content over a computer network. The Cache Management Application is also hereby incorporated by reference in its entirety.
The Internet standards that govern web interactions, both at the semantic level, such as HTML (a content language) and HTTP (a transfer protocol), are derived from an ASCII (American Standard Code for Information Interchange)-based environment. When using only ASCII, language is primarily restricted to English or ASCII derivatives of western European languages. Therefore, most meta information associated with content that comes across a network in HTTP is intended to be ASCII. Meta information is typically encoded information transmitted along with the main data in a data transfer to provide additional information associated with the main data, such as creation date, authorship, formatting, locale information, language, etc. However, with the proliferation of Internet use, Internet content providers are faced with the need to support, among others, multi-lingual website visitors. The problem exists, however, that there is no clear way for a multi-lingual website visitor to announce to a content provider his or her language preference. In fact, the problem goes beyond determining a users language preference and is a problem of determining a user's locale preferences. A user's locale can perhaps indicate not only a user's language preference, but also other locale specific information, such as the user's time zone, which can be used to indicate relative time differences between the user and the content provider. For example, a time indicator could indicate whether the user's locale supports Daylight Savings Time, which can be important in doing time calculations for timing of events.
Further, it is important to content providers to be able to provide content to a website user in a format that is useful and familiar to the user. For example, the date/time format, currency format, monetary symbols, the use of dashes, commas and periods, etc., can vary greatly from locale to locale. Even within a locale, language and format variances can occur. For example, Spanish has two sorting orders and Chinese has five. To properly present data to a variety of multi-lingual users, content providers need to be able to determine the user's locale in order to serve locale-appropriate content and related meta information. Related U.S. patent application Ser. No. 09/931,228, entitled “Method and System for Determining a Network User's Locale,” filed on Aug. 16, 2001 (the “Locale Detection Application”), discloses one such method and system for automatically determining a network user's locale to provide locale-appropriate content to a user. The Locale Detection Application is hereby incorporated by reference in its entirety.
However, even with methods and systems for determining a network user's locale, such as that disclosed in the Locale Detection Application, problems still exist with regards to managing the caching of content that can then be delivered in a locale-specific manner. This problem is of particular importance with regard to dynamically-generated content. Dynamically-generated content comprises content that is relatively “on demand,” because it can change with a relatively high frequency. Unlike static content (e.g., an image such as a .gif file or a .jpg file), dynamically-generated content may not be fully assembled and ready to be delivered until requested by a user. For example, the front page of a news site for the Chicago Tribune or other major newspaper can comprise dynamically-generated content because the headlines, the weather, or some other changing aspect of the displayed content may be subject to frequent change. Thus, the content presented to a user may not be constructed until the time of the user's request.
Dynamically-generated content can be, in some instances, cached. This means that previously generated versions of the content can be stored in, for example, a database, for future access by a user. Thus, by caching dynamically-generated content, content delivery latencies, as well as excessive use of computational resources can be avoided. This is because the same version of the dynamically-generated content may be generated once and then stored for subsequent access by multiple users. By not having to generate the same content for each user request for that content, the demand on a content provider's back-end server systems and databases can be substantially reduced. As a result, latencies between requesting of the content by a user and the delivery of the content to the user are similarly reduced.
Such latencies can occur because generating dynamically-generated content takes time. Often the various components comprising the requested content must be obtained from a database (or various databases) and then interpreted before being arranged together and delivered to a user. Although gathering and arranging the requested content may typically only take on the order of several hundred milliseconds, on a heavily-visited website, the cumulative demand of millions of users requesting the same content (and the content having to be generated individually for each user), can result in a very slow delivery of a content provider's home page (or any other page). In certain cases, such as for a newspaper website or other commercial media site, such delays can result in the death of the site because users will not want to wait long for content to be delivered in today's “must-have-it-now” environment. Further, the computational resources required to generate and deliver the same content for each user request can place a tremendous strain on the content provider's back-end server systems. Servers can thus be overworked and are much more likely to fail.
It is therefore typically preferable to cache dynamically-generated content whenever possible (i.e., until the content has undergone a change, at which point it can be regenerated and the new version, perhaps, cached). Load spikes and overburdening of server computers due to the regeneration of the same content each time it is requested by a user can be reduced or avoided. Adverse effects on a content provider's server(s) and on the network as a whole can generally be reduced or avoided. A content provider's back-end processes and their associated databases can thus be free to perform other tasks or otherwise enjoy the benefits of a decreased computational load.
The same advantages that caching provides for dynamically-generated content in general can be further taken advantage of with respect to locale-sensitive content. In today's increasingly global climate, and particularly on a global computer network such as the Internet, content providers have an increased need to support multi-lingual websites. Consider, for example, a newspaper content provider's website. Different language versions of an online newspaper can be provided to users, depending on the user's locale. For example, in the Pacific Rim there exist many newspaper websites that provide an English language version of their content and also a second version in another (perhaps local) language. At least two languages are typically supported, and maybe more. Content providers of this type can face the problem of having to cache different versions of the same content, based on language. For example, if a dynamically-generated page is cached (e.g., in accordance with the teachings of the Cache Management Application), the locale of the user may be automatically detected (e.g., in accordance with the teachings of the Locale Detection Application) and a page dynamically generated such that the page contents are locale sensitive. The template generating such a page may be marked as cacheable (i.e., the generated content will be cached); however, the dynamically-generated page, if cached in accordance with the language of the current requestor, may not be an appropriately locale-based version for a subsequent visitor. Thus, for example, if a first visitor to a website requests content in Chinese and the content is subsequently cached as a Chinese version, a subsequent English visitor to the website would be served that same Chinese page (if cached content were being served). Therefore, to provide locale-sensitive content that can be cached for delivery to subsequent visitors in a locale appropriate version, the content must be saved in such a way that it can be recalled in different versions for visitors from different locales.
Further, it is typically preferable to not cache highly volatile pages. Highly volatile dynamic pages are pages that are likely to change often and thus are preferably maintained as dynamic non-cached pages. Currently existing caching methods and systems perform what is called “cache-on-demand.” Cache-on-demand means that when a user requests certain dynamically-generated content, if that content (dynamically-generated page) is marked as cacheable, the system and method will cache the content after it is generated and prior to serving the content to the user. However, it may be desirable to identify highly-accessed dynamically-generated pages for pre-caching. Pre-caching means that the content is cached before it is requested by a user and maintained in a cache until it undergoes a change. The content can, after a change, be cached again as a new version. However, the same problems can arise with pre-caching as for cache-on-demand with respect to providing locale-sensitive content. In other words, to serve a user a locale-appropriate version of previously cached content, each cached locale-sensitive version of the content must in some way be associated with its respective locale and stored in a locale-sensitive manner such that the appropriate version of the content can be served from a cache to a requesting user.
With regard to pre-caching and to cache-on-demand, the situation may also arise where cached content requires updating prior to a user requesting the content. A content provider implementing a system and method for caching dynamically-generated content may wish to automatically regenerate and re-cache such content prior to a user requesting the out-dated version of the content. The capability to automatically regenerate cached content on a locale-sensitive basis also requires a locale identifier of some sort to be associated with each locale-specific version of the cached content.
Thus, for performance reasons, it may be necessary for a content provider to cache relatively static content available at its website. It would be most efficient to cache such content by locale. For example, many multi-lingual websites store multi-lingual content in content databases as Unicode data employing Unicode-compatible byte encodings (e.g., Universal Character Set Transformation Formation, 8-bit (UTF-8), Universal Character Set, 2-byte (UCS-2), etc.). Depending on the nature of the content, the encoding translation process from Unicode to a website visitor's encoding preferences (which can be set at the user's web browser) can be very computer resource intensive. With locale-sensitive caching, the computing cost of encoding translation occurs only when a dynamic page is first requested (i.e., the first time the content is generated). All subsequent requests for the same content by visitors with the same locale preferences will receive the cached locale-specific content, so long as the content is cataloged correctly by locale.
Based on proper user locale detection, a cached page's (cached contents) file specification (filing system identification) can have embedded locale information (e.g., language, territory, encoding, sort order, etc.) that associated the cached content with a specific locale. The server process selecting content for delivery to a user can retrieve the locale-specific cached content based on the detected locale of the user.
Successful multi-lingual websites thus must not only provide multi-lingual content and locale-sensitive navigation, but they must also support multiple simultaneous access to the website by clients operating in different locales. In short, correct transient locale synthesis is a first step in representing the requirements of each website visitor and in preparing a content provider's website experience in a culturally customized fashion. Each user (visitor to a website) may have different locale preferences which he or she would like to govern their web content consumption and navigation. A user's preferences may include language, data encoding, and monetary, numeric and/or time zone representations. A user's locale preferences can be either explicitly specified in a user profile, which can be acquired through, for example, a registration process, or implicitly embedded in content requests sent from users' agents (e.g., web browsers) to content servers (e.g., web servers). User locale preferences provide useful information to a content delivery system to properly format cultural or language-sensitive information, such as dates, times, and monetary and numeric information. Any caching system for such locale-sensitive content must be able to provide the same. These locale-sensitive preferences can be obtained by a method such as that disclosed in the Locale Detection Application.