1. Field of Invention
This invention is directed to a method and a system for managing the storage of documents. More particularly, this invention is directed to a method and a system for managing documents stored in a limited capacity storage device based on the content of the stored documents.
2. Description of Related Art
Information access plays a key role across an ever expanding range of leisure and work activities. Laptop, hand-held and palm-top computers are being reinvented as “portable information appliances” with the promise of information access anytime and anywhere. However, users of these devices have become reliant on continuous, high speed, low-cost networks available to computers, and portable computers are rarely connected to such networks. One technique that makes portable computers less dependent upon networking is a cache.
A cache is generally defined as a fast access memory that stores a copy of frequently referenced information. A cache reduces the reliance of the computer on a connection to a network and can provide documents even when disconnected from a network. Cache management systems control the information that is stored in a cache.
An important part of a cache management system is the replacement policy. The replacement policy governs which items will be removed from the cache when the cache fills up, that is, there is insufficient space in the cache to store a new item. A replacement policy requires inherently difficult decisions because each decision involves predicting the future. The accuracy of those predictions can only be measured after several requests for information have been provided to the cache. The accuracy of the predictions is measured by the efficiency of the replacement policy. The efficiency of a replacement policy for a cache is determined by the ratio of hits (items found in the cache) to misses (items missing from the cache).
The efficiency of a cache management system takes on greater importance for portable information appliances. When a computer is connected over a wireless network, and a cache miss occurs, it may not only take a relatively long time to locate and download the missing document, but the miss may also result in expensive fees for the network connection usage for the download. If the cache had the requested document in local storage, there would be no need to download the document from a network and expensive access charges would be avoided. Moreover, when a system is disconnected from a network, a cache miss may stop the user from continuing work. In an attempt to solve these and other problems with cache management systems, researchers have been investigating new replacement policies that better predict requests for documents or files.
One attempt to solve these problems involves augmenting a computational replacement policy with direct user interaction. This is based on the assumption that people often know which is the most important information to keep in local storage. Two systems, TeleWeb and Mowgli, are mobile web browsers that allow users to lock documents into the local storage. TeleWeb and Mowgli are described in “TeleWeb: Loosely Connected Access to the World-Wide Web”, W. N. Schilit et al., Computer Networks and ISDN Systems, 28 pp. 1431-1444, 1996, and “Optimizing World-Wide Web for Weakly Connected Mobile Workstations: An Indirect Approach”, T. Alanko et al., Proceedings of the 2nd International Workshop on Services in Distributed and Networked Environments (SDNE '95), Jun. 5-6, 1995, respectively, and are incorporated herein by reference in their entireties. In TeleWeb, the storage contents are exposed to the user in a file by file listing and users may pin or lock items, or delete items. One problem with this type of user control is that the users tend to lock more than they delete. Therefore, the local storage becomes much less effective because it becomes filled with locked, but no longer relevant, documents.
Another approach taken by previous systems is to ask the user which documents are appropriate to discard and which documents are appropriate to keep. Such an approach is described in “How to Program Networked Portable Computers”, D. Goldberg et al., Proceedings of the Fourth Workshop on Workstation Operating Systems, pp. 30-33, October 1993, incorporated herein by reference in its entirety. When replacement is necessary, the system may, for example, use a pop-up dialog box to ask for suggestions on which file to remove from storage. A pop-up dialog box is shown in FIG. 1. FIG. 1 shows a Graphical User Interface (GUI) 10 displaying a list of documents 12. The GUI 10 allows the user to select a file to discard based on the title 14. The list may be sorted for the user according to the titles 14 of the documents, the sizes of the documents or the dates of last access. The appropriate sorting is invoked when the user clicks the heading for the appropriate column. One problem with this approach is that the user is forced to explicitly select documents. A user must explicitly designate those documents that the user wishes to manage. For example, a user is forced to formulate a search query to retrieve the appropriate documents. Therefore, freeing up storage space of any significant amount requires multiple interactions. Furthermore, it is often difficult to determine a file's importance from only its file name, size or date of last access.
Some storage management systems try to predict file accesses by analyzing the history of file accesses. Two systems have been implemented that allow users to turn on recording of file accesses. The systems are described in “Detection and Exploitation of File Working Sets”, C. Tait et al., Proceedings of the 11th International Conference on Distributed Computing Systems, May 1991, pp. 2-9, and “Disconnected Operation in a Distributed File System”, James Kistler, (1993) Ph.D. Thesis, School of Computer Science on file with the Carnegie Mellon University Library, incorporated herein by reference in their entireties. Traces of file accesses are used by these systems to hoard or prefetch files into the local storage. More recently, a system described in “Intelligent File Hoarding for Mobile Computers”, C. Tait et al., MOBICOM 95, pp. 119-125, ACM, Inc., incorporated herein by reference in its entirety, extended this concept to a graphic interface that permits users to select which of a number of profiles to hoard. These profiles are created by observing patterns of file accesses to both application files and data files. However, this style of storage management only works well when users have recurring and predictable patterns of file accesses.
A number of systems have used automatic techniques to present groups of related documents to a user. One technique characterizes clusters of documents with keywords and titles of representative documents in order to support browsing or full-text retrieval. This technique is described in “Scatter/Gather: A Cluster-Based Approach to Browsing Large Document Collections”, D. R. Cutting et al., Proceedings of the 15th Annual International ACM/SIGIR Conference, 1992, ACM, Inc., and “Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results”, M. A. Hearst et al., Proceedings of the 19th Annual International ACM/SIGIR Conference, Zurich 1996, incorporated herein by reference in their entireties.
Another system presents clusters of Web documents with keyword pairs based on their intra-cluster similarity, and allows the user to expand the contents of clusters. Such a system is described in “Automatically Organizing Bookmarks per Contents”, Y. S. Maareck et al., Computer Networks and ISDN Systems, 28, pp. 1321-1333 1996, incorporated herein by reference in its entirety. However, these systems do not provide facilities for deleting, compressing, or taking other management actions on groups of documents.
None of the above systems works well for personal information appliances that provide access to information that cannot be neatly organized and that do not have a recurring access pattern.
A document management system is needed that augments the advantages of direct user interaction with a minimally invasive technique, does not fill up the cache with locked documents, and does not rely upon recurring access patterns.