This invention relates to data access systems which enable users to access data from remotely located data storage subsystems and, in particular, to an information access monitor which ascertains patterns of data access by the users and automatically compiles data for the users based upon the relevance of this information to the usersxe2x80x94interests, as indicated by prior patterns of data access by the users.
1. Problem
In a multiuser computer environment, such as on a corporate network, information is generated and consumed by users in all parts of the organization. As part of this process, many individuals access information via Internet, and store the results of their quest in their personal data storage directories. Much of this information is of interest to others in the organization, yet it is not made available to these interested parties absent the searcher forwarding a copy of this information to the interested parties. In addition, there is also information stored on computers within the organization that the searcher may not identify or be able to access. There presently is no single list of information sources or an identification of the contents of these sources available to the members of the organization. Thus, individuals searching for information replicate in part the prior search efforts of others and also do not necessarily disseminate the retrieved information to all those in the organization who have an interest in this information.
In large organizations, information is generated, distributed, stored and consumed in a manner that fails to ensure that all individuals who have an interest in this information receive copies of the information. Historically, organizations maintained a central library which was the repository of information of a general public nature. In addition, the organization concurrently maintained a corporate records department which stored and maintained the private corporate correspondence and trade secret documents. Thus, when an individual working in the organization desired to obtain information, the search was initially divided between these two types of information. The two libraries of information were cataloged by professional librarians and were relatively simple to search, generally with the assistance of the library staff. With regard to information generated within and by the organization, this information was typically propagated from the author to members of the author""s department and to interested individuals in other departments via standard routing lists.
With the advent of computerized sources of information and the availability of electronic media via which the information can be obtained, this traditional library structure has lost its effectiveness. Individual members of an organization can search for information from diverse sources. The access to these sources is typically via xe2x80x9cInternetxe2x80x9d which is a world wide link of computers that communicate via commonly understood protocols. The Internet also functions as a repository of information published by many sources: libraries, corporations, universities, research institutions, organizations, governments, individuals, and the like. A variety of tools are available to the users to access this information from Internet. However, a problem with Internet is that the xe2x80x9csearch and locatexe2x80x9d functions used to obtain information of interest to a user are non-trivial to execute. In particular, the Internet and the search engines used to locate information which is accessible via Internet are somewhat eclectic at best. The users must expend a significant amount of time and effort to locate and retrieve information from the scattered sources of information. From the organization""s point of view, this problem is exacerbated by the fact that numerous members of the organization are redundantly searching for information and storing identical information in their private directories on the organization""s data storage subsystems. Thus, the entire information library function has devolved from the professionally run organizational libraries of the past to the distributed, disorganized and grossly inefficient electronic data storage procedures of the present.
Therefore, there presently is no automatic data indexing mechanism available to organizations to address the problem of individual storage of information of relevance to the organization and the absence of any correlation process to enable other members of the organization to benefit from the search efforts of their peers. The information stored on the organization""s data storage subsystem is therefore ineffective, even though its availability and pertinence to the organization may be high.
2. Solution
The above described problems are solved and a technical advance achieved in the field by the Information Access Monitor (IAM) of the present invention. In the preferred embodiment of the invention, the information access monitor is part of a computer system comprising a plurality of interconnected processors. The information access monitor is located at the Internet gateway of the computer system""s data communication network. Users access information that is pertinent to their work and to others in their organizations by means of the Internet. The information access monitor therefore functions to monitor information flows between the internal data communication network and Internet to identify these information requests and responses. The information access monitor generates relevance indexes for these requests and responses and compiles a xe2x80x9ccorporate consciousnessxe2x80x9d of all data relevant to the organization. The information access monitor computes user/group profiles to identify information needs and interests within the organization and can then automatically associate users/groups with information of relevance. The users can be advised of information retrieved from the Internet by others via information access monitor generated relevance indexes, or by xe2x80x9ccopy toxe2x80x9d lists or they can be directed to pertinent information in response to the user seeking information. The information access monitor thereby automatically creates xe2x80x9cvirtual bibliographiesxe2x80x9d which reflect topics of interest to the users of the computer system. These virtual bibliographies are continuously created and updated as needed by the users actions in accessing information through the Internet gateway.
In this environment, the monitoring of information accesses enables the computer system to avoid the redundant storage of information, since a single copy of a data file is stored on the computer system""s data storage subsystem and is accessible by all users of the computer system. Alternatively, the information management function is implemented by recording the identity of the Internet information source so the information can be reaccessed without the need to retain a copy. It is only the information cataloging and relevance match data that must be created and manipulated to enable efficient access to the information stored on the data storage subsystem and external sources. This data management technique therefore has a significant impact on the amount of data stored in the data storage subsystem.
The automated cataloging of information creates a dynamic and adaptable information storage and retrieval system, since the information storage and retrieval patterns are determined by the users of the computer system. As the users"" interests change over time, the information access monitor changes the information dissemination procedure in synchronization with the user"" interests. The result of this architecture is that the computer system can efficiently support multiple forms of information retrieval. Documents can be retrieved from the single copy of the document stored in the data storage subsystem. Relevance searches can be executed on the indexing information created by the information access monitor and stored in the computer system. Furthermore, event based data can be stored to record a temporal record of the information retrieval and access activity in the computer system. Additional features that can be supported include document annotation, wherein a user can annotate and forward retrieved documents to other individuals. These capabilities are nowhere found in existing data storage and management systems.