This invention relates to the field of electronic content provision. More specifically, it relates to gathering related content from internet and intranet sources and providing access to same in response to user requests.
A huge quantity of information is being continuously created and made available via electronic communications systems. There is so much information that it is simply not possible for an individual person to read it all. On the other hand, it is imperative that certain items of information reach certain people. Much of the electronically-provided news information ages rapidly, such that it loses its relevancy in a matter of days, or even a matter of hours (e.g., stock market information). Each person has different needs for information, and requires access to a different subset of the available information. In light of the foregoing, there is clearly a need for a system and method for rapidly accessing categorized electronic information.
One difficulty in providing the information is that the information is being created in many different places. News articles about events in the world or business community, and articles written for newspapers, magazines and journals, can generally be obtained through various content providers, who frequently aggregate the information from a number of sources into single continuous electronic streams. No content provider today, however, provides access to all available information, so there is a trade-off between full access and complexity. Moreover, an individual user is frequently forced to subscribe to a host of services in order to obtain the information which is generated from different sources, in different countries, and in various languages. Subscribing to many services to some extent negates the benefits realized by the content aggregation by providers, since the user must then often filter through multiple copies of the same documents.
Internally, organizations face similar issues. Memos, announcements, documents of various kinds, and intranet web content are created at multiple locations throughout an organization, yet are generally not readily available to all members of the organization. Therefore, the process of collecting the information from all points of origins is a key issue, along with categorization and controlled dissemination of that information.
Another aspect of the problem is the actual matching process, comprising matching the collected and categorized content with an individual user""s interests. For matching to work, an individual user must be able to express a diverse set of interests, not just one interest. A language of some kind is necessary to provide a medium for this expression of the user""s interest. Further, a system is needed to capture the language and apply it to the items of information. Moreover, the language must embody some kind of high level semantic knowledge, since past word-search-based systems have fallen short of a satisfactory solution. The ability to express, capture and apply a person""s interests or needs is a critical feature of the problem.
Finally, there is a need to deliver the information to people who have expressed an interest. The primary requisites for delivery are making sure that access to the information is convenient, even in dynamic situations, and making sure that delivery can occur quickly once the information becomes available. Moreover, people are increasingly mobile and have varied styles of working and of accessing and processing information. An effective delivery system will therefore require that the means of access be ubiquitous, that multiple means of access be available, and that delays in making the information available be minimized.
It is therefore an objective of the present invention to provide a system for gathering, categorizing, and delivering electronic content to users in response to user requests.
It is another objective of the invention to provide a system and method for gathering content from both inside (i.e., intranet) and outside (i.e., internet) sources and categorizing same for provision in response to customized user requests.
Yet another objective of the present invention is to provide a customer with the ability to embed user interest and delivery mechanisms into customer applications.
These and other objectives are realized by the present invention which provides a system for collecting and categorizing metadata about content provided via the internet or intranet, regardless of the language of generation of the content. The content of each document is assigned token IDs, which token IDs are the same for any given topic irrespective of the language in which the document is written. Filtering of single language documents will generate a single output; whereas, multilingual documents will be divided into language segments with each segment being filtered by the appropriate language filter.