Organizations around the world have found that electronic publishing technology improves productivity and decision making processes among their employees, customers, suppliers, and the public by providing more timely access to critical information. Web-based technologies have emerged as the most credible alternative to implement such systems at low cost.
In their earliest manifestation, Web-based publishing tools consisted for the most part of conversion programs that, for example, converted text-based documents generated by word processors such as Microsoft Word or WordPerfect, to suitable Web browser formats such as Hypertext Markup Language (HTML). However, this process rapidly became more complex. In their simplest extension, Web tools facilitated the management of hyperlinks to permit users to navigate through a series of related documents in a way that makes sense.
The typical Web page is now designed to include graphics and even multimedia-type information. Indeed, in some instances, the Web is used to supplant more traditional media content such as radio and television. For example, Web sites such as washingtonpost.com, cnn.com, and usatoday.com consist of thousands of pages where the content dynamically changes. Furthermore, the pages within such Web sites contain sophisticated media effects such as moving graphics, sound, video clips, and other effects implemented with plug-ins, Java code, and the like to appear more enticing to those viewers who are familiar with traditional media such as television.
Creating and maintaining a large scale multimedia popular Web site involves many tasks. These include designing the site, organizing its content, developing it, and managing it on a daily basis. Typically, many people are involved in producing these sites. Those providing the content may be authors, graphic artists, or multimedia specialists who do not necessarily know how to program Web pages or other details of how a Web site is implemented exactly. Those responsible for the technical implementation of the site may understand how to program in HTML or Java but may typically know little about how best to present content.
Certain advanced tools have emerged for use in managing Web-based applications. One such tool is the Netscape Application Server(trademark) (NAS) advanced application server software available from Netscape Communications Corporation. NAS (formerly known as KIVA Enterprise Server) allows developers to deploy Web-based application logic as a centralized entity separate from the Web servers that are responsible for assembling pages and transmitting them over a network to client browsers. The NAS server may also be used to manage transactions with back end databases in a manner which is transparent to client software. NAS thus frees Web site implementors from having to determine the details of how to deploy application logic as JavaScript or CGI code running on the Web servers or within the Java applets downloaded to Web browsers. The Web servers and Web browsers can therefore be programmed to implement presentation logic only, leaving the application logic to the application server. When changes may be necessary for the application logic, it need not necessarily require redevelopment of the presentation logic.
In addition, NAS supports application partitioning, which is a form of distributing application logic among multiple servers. The components of a large scale application can therefore be grouped to facilitate their execution in high demand applications. For example, NAS provides features such as dynamic load balancing wherein page requests can be routed to a least loaded server. NAS also supports caching the results of transactions, such as database queries, and subsequent requests for the same pages, may be transparently redirected to the cached information.
The present invention is a distributed publishing system which includes a content server that acts as a highly scalable page generation and content management system. The content server manages page generation in a dynamic delivery environment. Pages are generated automatically, on demand, by maintaining the form, or design of the pages, distinct from the content of the pages. When the pages are requested, the content server automatically joins the page content to the page design.
The content server generates and stores pages depending upon their expected use. For example, entire pages, or small page components referred to herein as xe2x80x9celements,xe2x80x9d are locally cached in a file or database for later access. Whether a page is cached as a completed unit or as a set of elements depends upon whether elements contain dynamic content. This permits efficient production of pages on demand.
More particularly, the content server identifies a page in a site catalog through a page name. Each page name has an associated template, or so-called root element; the same template can be associated with any number of page names.
Templates themselves are also treated as elements in an element catalog and identified by an element name. Each template typically includes instructions that describe how the content server should process certain contents of the page. Elements contain a set of instructions that determine how specific content is to be formed into pages or portions of pages. Elements can be relatively small, such as producing a single xe2x80x9chref,xe2x80x9d or relatively large, comprising the instructions for an entire page. Elements generally fall into two categories, layout and logic. Layout elements describe where components are placed on a page, and logic elements describe the actual content or how to locate content. Elements can contain both standard HTML or special tags such as XML tags. The XML tags, which may be a form of server side markup language, can be used to retrieve data from a content catalog according to instructions specified by the elements. Elements of either type can also specify conditional behavior, which produces different results depending upon execution context specific variables.
The content server runs in a distributed, multi-tiered environment such as Netscape Application Server(trademark). That is, the machines and software involved in the application are divided into three layers, or tiers, including (1) a client tier, where users interact with the content server via Web browsers to implement presentation logic; (2) a middle tier, comprised of a Web server, a Netscape Application Server(trademark), and the content server application code for implementing the application logic which generates the Web pages dynamically; and (3) a database tier, which may include one or more database servers that permit interaction with back end databases to access information that forms the basis of the application, e.g., Web page content.
The content server provides a variety of ways to increase page production performance by implementing caching at various levels of a hierarchy. This hierarchical caching is in addition to the standard NAS-based memory caching and thus permits tuning of the caching implementation in order to optimize delivery of elements based upon how dynamic particular page elements are.
The lowest level of caching is a type of result set caching. Result sets are formed from the data that directly results from, for example, a back end database query operation. At this level of caching, the data is cached in its raw form by the content server after extraction from the database. Cached data is generally stored as an attachment to the table which was queried; however, it can be stored against any catalog or table in the database. Results set caching improves performance of the system as a whole by reducing the load on back end databases, as well as the response time experienced by end users. This particular kind of low level caching is advantageous when data is repeatedly served, and where it is reformatted for different delivery targets by the content server.
At a next level of hierarchy, the content server provides element caching. At this level of caching, the aforementioned elements are cached in a executable form. Thus, the element source code is retrieved and validated, and an execution tree representation of the element is built. It is this executable tree representation of the element which is cached. At the time that a page needs to be rendered, the associated executable elements are retrieved to the extent possible from the cache by the content server.
Any environment-specific variables for the particular viewer are then located and the element is then executed to render the page. More specifically, the content server also supports the concept of establishing a session for a given user. The content server maintains such a session as a set of unique context variables associated with the particular user or xe2x80x9cconnection.xe2x80x9d The existence of such a session makes it possible to maintain state information across a stateless connection model. This means that pages can set session variables that are available to be used during element execution, without setting cookies or otherwise requiring that parameters be passed along with page requests.
Finally, the highest level supports completed page caching in which complete pages are served via the content server on request. This provides an extension to the NAS native memory cache, in that page data can be screened before serving this and different versions of the same cache page may be returned based upon the context criteria, such as browser type, user privileges, and other user-specific environment variables.
The hierarchical structure of caching within the content server thus enables a caching implementation which is optimized to the particular page dynamics. For example, some composed pages may contain data which is very dynamic. These pages are typically candidates for element or result set level caching. Element caching which may be implemented, for example, for rapid retrieval of image-based components such as navigational bars or indices on a page. Such elements may be relatively staticxe2x80x94that is, they do not change so frequently as to require regeneration for each request, but may still have versions for specific user populations. These elements can thus be thought of as page parts and served back efficiently by the content server, while still allowing custom composition of other parts of the page as required.
In other instances, pages may be relatively static and therefore prime candidates for page level caching. However, even such page level caching may have properties associated with them so that a given page can be selected based upon user context.
Within NAS, multiple physical Web servers can be arranged in a cluster. Clustering provides advantages such as load balancing in highly demand driven environments. For example, a request for particular pages at the Internet Protocol (IP) level may be routed in a round-robin fashion among Web servers. NAS provides a feature of watching the load on all servers and routing requests to perform loading balancing. Therefore, even if a page request is directed to a particular server, if that particular server is heavily loaded, NAS may route the page request to a lesser loaded member of the cluster.
In the present invention, the cache server distinguishes whether the servers cache together or not. Using this so-called synchronous set caching, one member of a synchronous set of servers may be designated as a sync master. Synchronous set members share cache state information with other members of the same set. In this manner, if another member of the set retrieves a cached element, it can be determined if the local copy of the element is fresh. If not, the local cache copy is stale and therefore the sync master must be retrieved and used instead. In this manner, the benefits of caching among the cluster with load balancing, can be achieved while freeing the application developer from the concern of assuring that fresh content is always used.