a. Field of the Invention
The present invention concerns building resource (such as Internet content for example) and attribute transition probability models and using such models to predict future resource and attribute transitions. The present invention also concerns the use of such resource and attribute transition probability models for pre-fetching resources, for editing a resource link topology, for building resource link topology templates, and for suggesting resources based on resource transitions by others (or "collaborative filtering"). In particular, the present invention may be used in an environment in which a client, which may be linked via a network (such as the Internet for example) with a server, accesses resources from the server.
b. Related Art
In recent decades, and in the past five to ten years in particular, computers have become interconnected by networks by an ever increasing extent; initially, via local area networks (or "LANs"), and more recently via LANs, wide area networks (or "WANs") and the Internet. The proliferation of networks, in conjunction with the increased availability of inexpensive data storage means, has afforded computer users unprecedented access to a wealth of data. Such data may be presented to a user (or "rendered") in the form of text, images, audio, video, etc.
The Internet is one means of inter-networking local area networks and individual computers. The popularity of the Internet has exploded in recent years. Many feel that this explosive growth was fueled by the ability to link (e.g., via Hyper-text links) resources (e.g., World Wide Web pages) so that users could seamlessly transition from various resources, even when such resources were stored at geographically remote resource servers. More specifically, Hyper-text markup language (or "HTML") permits documents to include hyper-text links. These hyper-text links, which are typically rendered in a text file as text in a different font or color, include network address information to related resources. More specifically, the hyper-text link has an associated uniform resource locator (or "URL") which is an Internet address at which the linked resource is located. When a user activates a hyper-text link, for example by clicking a mouse when a displayed cursor coincides with the text associated with the hyper-text link, the related resource is accessed, downloaded, and rendered to the user. The related resource may be accessed by the same resource server that provided a previously rendered resource, or may be accessed by a geographically remote resource server. Such transiting from resource to resource, by activating hyper-text links for example, is commonly referred to as "surfing" (or "Internet surfing" or "World Wide Web surfing".)
As stated above, resources may take on many forms such as HTML pages, text, graphics, images, audio and video. Unfortunately, however, certain resources, such as video information for example, require a relatively large amount of data to be rendered by a machine. Compression algorithms, such as MPEG (Motion Pictures Expert Group) encoding have reduced the amount of data needed to render video. However, certain limitations remain which limit the speed with which resources can be the communicated and rendered. For example, limitations in storage access time, limits the speed with which a server can access a requested resource. Bandwidth limitations of communications paths between an end user (client) and the resource server limits the speed at which the resource can be communicated (or downloaded) to the client. In many cases, a client accesses the Internet via an Internet service provider (or "ISP"). The communications path between the client and its Internet service provider, a twisted copper wire pair telephone line, is typically the limiting factor as far as communication bandwidth limitations. Limitations in communications protocols used at input/output interfaces at the client may also limit the speed at which the resource can be communicated to the client. Finally, limitations in the processing speed of the processor(s) of the client may limit the speed with which the resource is rendered on an output peripheral, such as a video display monitor or a speaker for example.
The limitations in processing speed, storage access, and communications protocols used at input/output interfaces are, as a practical matter, insignificant for the communication and rendering of most type of data, particularly due to technical advances and a relatively low cost of replacing older technology. However, the bandwidth limitations of the physical communications paths, particularly between an end user (client) and its Internet service provider, represent the main obstacle to communicating and rendering data intensive information. Although technology (e.g., coaxial cable, optical fiber, etc.) exists for permitting high bandwidth communication, the cost of deploying such high bandwidth communications paths to each and every client in a geographically diverse network is enormous.
Since limitations in the bandwidth of communications paths are unlikely to be solved in the near future, methods and apparatus are needed to overcome the problems caused by this bottleneck so that desired resources may be quickly rendered at a client location. Even if the bandwidth of communications paths are upgraded such that even real time communication of video data is possible, historically, the appetite for resource data has often approached, and indeed exceeded, the then existing means of communicating and rendering it. Thus, methods and apparatus are needed, and are likely to be needed in the future, to permit desired resources to be quickly rendered at a client location.
The concept of caching has been employed to overcome bottlenecks in accessing data. For example, in the context of a computer system in which a processor must access stored data or program instructions, cache memory has been used. A cache memory device is a small, fast memory which should contain most frequently accessed data (or "words") from a larger, slower memory. Disk drive based memory affords large amounts of storage capacity at a relatively low cost. Data and program instructions needed by the processor are often stored on disk drive based memory even though access to disk drive memory is slow relative to the processing speed of modern microprocessors. A cost effective, prior art solution to this problem provided a cache memory between the processor and the disk memory system. The operating principle of the disk cache memory is the same as that of a central processing unit (or CPU) cache. More specifically, a first time an instruction or data location is addressed, it must be accessed from the lower speed disk memory. During this initial access, the instruction or data is also stored in cache memory. Subsequent accesses to the same instruction or data are done via the faster cache memory, thereby minimizing access time and enhancing overall system performance. However, since storage capacity of the cache is limited, and typically is much smaller than the storage capacity of the disk storage, the cache often becomes filled and some of its contents must be changed (e.g., with a replacement or flushing algorithm) as new instructions or data are accessed from the disk storage. The cache is managed, in various ways, in an attempt to have it store the instruction or data most likely to be needed at a given time. When the cache is accessed and contains requested data, a cache "hit" occurs. Otherwise, if the cache does not contain the requested data, a cache "miss" occurs. Thus, the data stored in the cache are typically managed in an attempt to maximize the cache hit-to-miss ratio.
In the context of a problem addressed by the present invention, some client computers are provided with cache memory for storing previously accessed and rendered resources on the premise that a user will likely want to render such resources again. Since, as discussed above, resources may require a relatively large amount of data and since cache memory is limited, such resource caches are typically managed in accordance with simple "least recently used" (or "LRU") management algorithm. More specifically, resources retrieved and/or rendered by a client are time stamped. As the resource cache fills, the oldest resources are discarded to make room for more recently retrieved and/or rendered resources.
Although client resource caches managed in accordance with the least recently used algorithm permit cached resources to be accessed quickly, such an approach is reactive; it caches only resources already requested and accessed. Further, this known caching method is only useful to the extent that the premise that rendered resources will likely be rendered again holds true.
In view of the foregoing, methods and systems for quickly rendering desired resources are needed. For example, the present inventors have recognized that methods and systems are needed for predicting which resource will be requested. Moreover, the present inventors have recognized that methods and systems are needed for prefetching the predicted resource, for example, during idle transmission and/or processing times.
Limited bandwidth and the limitations of the least recently used caching method are not the only present roadblocks to a truly rich Internet experience. As discussed above, hyper-text links have been used to permit Internet users to quickly navigate through resources. However, human factor and aesthetic considerations place a practical limit on the number of hyper-text links on a given HTML page. In the past, defining the topology of an Internet site by placement of hyper-text links was done based on intuition of a human Internet site designer; often with less than desirable results. Thus, a tool for editing and designing the topology of a resource server site, such as an Internet site for example, is needed. The present inventors have recognized that methods and systems are needed to edit link topology based on resource or attribute transition probabilities.