This invention relates to electronic information distribution systems and more particularly to a method for indexing, combining, managing and distributing information via the Internet.
The Internet and the World Wide Web make a huge quantity of information available to the public at large, but better mechanisms are needed for directing the most useful information to those persons who have the greatest need or desire for it.
Hierarchical directories, such as the widely used Yahoo.com directory, help users find Web pages of interest by classifying Web sites in accordance with a hierarchical subject-matter classification system. Typically, at the time a given Web site is registered with the directory, the operator of a Web site selects an existing category which is most nearly descriptive of the content being made available. Human reviewers employed by the directory service may review Web sites being registered to help insure that each site meets minimum quality standards and has been properly classified. When a directory user specifies a category of interest, Web sites classified in that category are typically listed alphabetically by site name or title, with each item in the listing being specified by a short description either submitted by the registrant, written by the reviewer, or extracted from the indexed Web site, together with a hypertext link to the described site. Users can then review the listed Web site descriptions in a category of interest, and visit those sites which appear to be of the most interest.
Automated Internet search services use programs variously called xe2x80x9cWeb crawlers,xe2x80x9d xe2x80x9cspiders,xe2x80x9d and xe2x80x9crobotsxe2x80x9d to seek out and automatically index Web pages. These programs build keyword-based inverted file indices which are used to rapidly process queries from users which contain one or more terms or phrases, and return to the user listings of those indexed Web pages which contain those terms or phrases. Each item appearing in the search result listing is typically described using all or part of the xe2x80x9ctitlexe2x80x9d extracted automatically from the Web page, or a short representative section of the text if the page is untitled. Each listing further includes a hyperlink to the indexed page itself, allowing the user to click on a listed item of potential interest and thereby view the indexed page in its entirety. Frequently, pages in which the search terms or phrases most frequently appear are listed first.
It is an object of the invention to better direct information which is available via the Internet to those who need or desire it.
It is a further object of the invention to enable an user to locate desired information and to differentiate between different items of such information based on its attributes.
It is a still further object of the invention to direct a user""s attention to one or more passages of particular interest within the content of a larger work which is available via the Internet.
In accordance with a principal feature of the invention, information which is available via the Internet is retrieved and analyzed to form items of metadata here called xe2x80x9ccitationsxe2x80x9d which comprise the combination of one or more Uniform Resource Locations (URLs) from which source information may be retrieved, an optional specification of one or more particular portions of that source information, and one or more attribute values which characterize the information specified by the citation. These citations are combined in a citation database which is made available to users via the Internet.
Individual users employ a rendering program capable of receiving citations from the citation database, sorting and/or filtering the available citations based on a criteria specified by the user, and retrieving information specified by selected citations for presentation to the user.
As contemplated by the invention, information specified by selected citations may be pre-fetched via the Internet and placed in local storage where it will be more rapidly available for immediate review by the user.
In accordance with another feature of the invention, the attributes stored with each citation may specify subject matter categories, weighting values indicating literacy level, the presence of profanity, the popularity of the source site, rankings reflecting the approval or disapproval of other users, etc. The presence of these attributes allows the user to locally sort the retrieved citations in specified orders, or filter the citations for particular information based attribute values contained in the citations, and to differentiate between information meeting selected criteria based on each items perceived quality or usefulness.
In accordance with still another feature of the invention, each citation may include a xe2x80x9cpassage identificationxe2x80x9d which specifies a portion only of the total information identified by a given URL. The portion of the information defined by the passage identification may be selected by human editor or by automatic processing of the information retrieved. By using the passage specification, the rendering program may affirmatively direct the user""s attention to those passages of particular interest. The rendering program may present the total content of the information specified by a given citation""s URL at the request of the user.
As contemplated by the invention, an analysis facility located remotely from end users may generate a comprehensive collection of citations describing information obtained from numerous Internet resources and store the collected citations in an Internet server. A user can then transmit a request to the server containing a first specification which characterizes information of interest to the user. This first specification may then be compared with the citations at the server and only those citations which satisfy the first specification are then downloaded to the end user, preferably in XML format. The downloaded citations are then stored on the user""s client computer where they are available to an rendering program which executes on the client computer. The rendering program is adapted to filter and/or sort the locally stored citations in accordance with a user request containing a second and more refined specification, and to present information contained in the citation, fetched via the Internet from the resource specified by the citation or both.