The present invention relates in general to the transfer of data in computer networks and more specifically to a system for publishing, organizing, accessing and distributing information in a computer network.
Accessing information, and publishing information for others to access or obtain, are important features of computer networks. However, although the trend is to make information access and publishing easy for users of computers and computer networks, many of the mechanisms available are not easy for an average user to master. For example, publishing a web document not only requires a user to have some knowledge about where to publish, and to what audience to publish, but the user may have to publish the document to several “sites” or locations to make the document readily available to a desired number of users. This is the case, for example, when a company uses the company's network, or intranet, that has different web sites associated with different departments, regions, etc.
The lack of structure or organization of web pages, and documents, on networks can be both good and bad. Lack of structure can allow easy publishing of documents without placing a burden on the publisher to comply with a predefined organization. This also lets each web site developer, online business, database, etc., to create a customized organization that is best suited to the specific type of information. However, lack of structure and organization also creates difficulties for a user of the network to efficiently search for documents. Often a user has to perform many searches and access different websites and utilities to look for the document. This involves much typing and mouse (or other user input device) manipulation, is time-consuming and can be frustrating and counter-productive.
An analogy can be made to a newspaper which has an effective and well-known organization. A reader of the newspaper can quickly obtain information from the newspaper by going to a subject section such as “Business,” “Sports,” “Travel,” etc. The newspaper also provides an index or table of contents. Articles are organized in order of importance with “links” to other sections of related news, such as the continuing text of an article. However, to achieve this level of organization means that considerable time must be spent on editing, page layout, paste-up, etc. Also, writers, editors, and other people must work in a concerted effort to produce the organized information. The approach of computer networks has been to allow each writer/publisher to throw an article into a haphazard network “bin” and to rely on loose organization mechanisms such as keyword searching, folder organization, hyperlink organization or criteria organization.
An example of a web site structure that a typical company might provide on its internal intranet is to have different web sites for functions, or departments, such as “Human Resources,” “Marketing,” and “Finance.” If the company is large, there may be different regional offices, each having these functions. If, for example, the company has offices at locations in the U.S., Europe and Asia, this amounts to 9 different intranet sites for information. Also, there will typically also be a main site for each regional office, a main site for each department or organization, and a main site for the overall company.
Thus, we find 16 possible sites in all for the example discussed above. The typical organization for documents associated with these sites is to have the documents pointed at by links. The links can be organized into categories at each site's web page. Not only does this make publishing information extremely difficult when it is desired to mike information available to more than one site; but any person interested in searching and obtaining information may have to visit several sites. Also, the task of publishing documents to the various intranet sites is usually handled by a different person than the writer/publisher. Not only can this become a huge task, given the number of documents and sites, but mistakes in classification are likely.
Some traditional methods for accessing documents in computer networks include keyword searching. This allows a user to make a relational query such as “movie review.” The search will return documents that include the term “movie review” somewhere in the document. The documents can be at any number of sites. A search can be further narrowed by for example, including a relational term such as “AND” and the name of a movie reviewer. Also, a specific date, or period of time, can be specified. However, because of the huge volume of information on most intranets (and certainly the worldwide Internet) the number of documents that match basic keyword searches is very large. Unless the user is very familiar with the terminology, and type, of documents relating to the subject in which the user is interested, the user's keyword search will most likely turn up many documents in which the user is not interested. These must be further filtered by refining the query until the proper documents are identified. With this approach it is often impossible to obtain a list of only relevant documents in which the searcher is interested. The scope of the keyword search can not be set by the user but is determined by the entity running the search engine and compiling the search engine database.
Besides the large volume of information, another difficulty in obtaining desired documents is that documents are created and “published” to the networks with few, or no, restrictions as to their form and organization. In other words, a web page can be created and published by a user that includes text, images, etc. with an arbitrary organization. A document might or might not have a title, author's name, publication date, etc. The text of a document can be arranged in columns, paragraphs, one-liner separated by images or graphics, etc. Often a document may not have any short identifying features, or any way to tell where one field, such as the subject of the document, begins and ends so that the subject may be indistinguishable from the body of the document at least insofar as a keyword search is concerned.
One approach to overcome some of these problems is to hand-annotate documents found on networks. Typically, this is done after document creation (sometimes long after document creation) by a person who was involved in the creation of a document, web page, etc. Not only does this require substantial amounts of manual labor and time by persons having some skill and knowledge in the area to which the documents relates, but, by attempting to organize and summarize aspects of the document, mistakes can be introduced, thereby compromising the degree of accurate, searchable information.
Another approach is to use “folders” or sub-directories in programs such as email programs or web browsers. However, organizing information in this way is usually done manually by the viewer of the information (i.e., the email or web documents). There is no provision for publishing to a user's private organization as these folders are hidden from publishers. Where a public organizational hierarchy is implemented with folders, such a hierarchy often becomes large and complex, requiring much time to navigate. Also, this approach does not provide flexible security or access control.
Some web sites, such as www.yahoo.com, accumulate information such as documents, web pages, etc., from various sources and categorize, summarize and annotate the documents. This multicriteria organization defines categories which are presented to a user searching the Internet as a hierarchy of web pages. Each successive web page in the hierarchy (i.e., web pages progressively lower in the hierarchy) contain a new sub-category of selections that further narrow the category. At some point, the user decides that the category is the one desired and clicks a control. A collection of information that fits the category is then presented to the user.
However, this approach often requires that documents be interpreted and classified by a person other than the author so that errors can be introduced. Also, a considerable amount of work is required to do the classifying, write an abstract, etc. Another drawback is that the navigation through web pages can be slow. Also, a user does not have an awareness of the overall classification scheme being used. In other words, the user does not know how many sub-category levels there are in the hierarchy, or what types of classifications are used, until the user has done a substantial amount of investigating into the hierarchy “tree” classes.
Still other drawbacks of the prior art include the inability to index to individual pages, sections, or portions of a document. This means that text that would otherwise be maintained as a single document must be broken into several documents if it is desired to only allow certain groups to have access to different portions of the original text. Current network organizations do not provide a very flexible security and access system. Usually a website is restricted to user's with a certain account or password. Each user wishing to access the site, and all of the site's documents must enter the password. The use of passwords is difficult to maintain since accounts must be set-up, user's can forget the passwords, etc. Also, the granularity of password protection is very coarse as an entire website is usually either open or closed to a particular user.
Thus, it is desirable to provide a computer network-based system that overcomes some or all of the problems in the prior art and provides an efficient system for publishing, organizing, accessing and distributing information in a computer network.