Advances in electronic storage technology has resulted in the creation of vast databases of documents stored in electronic form. These databases can be accessed from remote locations around the world. As a result, vast amounts of information are available to a wide variety of individuals. Moreover, information is not only stored in electronic form, but it is created in electronic form and disseminated throughout the world. Sources for the electronic creation of such information includes news, periodicals, as well as radio, television and Internet services. All of this information is also made available to the world through computer networks, such as the worldwide web, on a real time basis. The problem with this proliferation of electronic information, however, is how any one individual may access information useful to that individual in a timely manner. In particular, how any one individual can receive individual pieces of information on a real time basis (e.g., a stream of documents) and decide which pieces of information are useful to the user.
Specifically, there are many search techniques to retrieve information from a database or data stream such as Boolean word searches, typed information retrieval or vector space based retrieval algorithms. Vector space based algorithms calculate a number that represents the similarity between any document in a database and a vector profile having a series of terms or phrases. Vector space based algorithms, while general and sophisticated, have several shortcomings. One of them is the fact that numeric vector space scores of documents against two different profiles, in general, are not directly comparable to each other. This is unsatisfactory for several reasons. First, from the point of view of an end-user, it might be desirable to inspect scores for a certain document in contexts of several profiles. This could be done, for example, in order to evaluate the performance of the profiles in question so that they can be adjusted to improve their accuracy. Another use for comparable, or normalized, scores across profiles is to facilitate a multiple classification procedure. One way to implement a multiple classifier is by employing a score threshold for tags (classes, profiles). For this to be meaningful, the scores for different tags have to be comparable to each other.