Curators of online library systems created for such entities as universities, businesses and/or government agencies, typically seek to provide their users (e.g., personnel associated with such entities) with access to a wide variety of pieces of data from a wide variety of sources to enable their users to obtain a very complete view of whatever subject they may research, including information from different sources to enable at least some degree of fact checking, and/or including opposing opinions to provide wider perspectives. In so doing, such curators often enter into licensing agreements with multiple providers of data in an effort to broaden the sources of data that become part of their online library system. However, curators may subsequently find themselves caught by surprise by the introduction of various functional problems into their library systems by such efforts. More specifically, instances in which seemingly broadly distributed data sets turn out to be duplicative or derivative versions of data that all stem from a single source of data may be unknowingly introduced, thereby creating functional bottlenecks that may impair the operation of the overall system.
It has been repeatedly said that the growing prevalence of the Internet has helped to “democratize” information. More specifically, any person with access to the Internet is able to put information out onto the Internet, and that information will become as easily accessible to anyone around the world as information also put out there by historians, scientists, government officials, news professionals, etc. Unfortunately, this same growing prevalence of the Internet has also given greater opportunities for those committing plagiarism to more easily obtain information from numerous sources, and to put out that same information under their own name, while giving no credit to those sources. Also, the vast number of scholastic, corporate, religious and/or governmental entities, as well as the vast number of individual persons, that now regularly put out information onto the Internet has created a situation in which those seeking information on almost any subject are often overwhelmed by the number of different apparent sources of information on that subject.
Still further, many individuals who put out pieces of information on the Internet are frequently not trained in, and/or are uninterested in, best practices in generating what they put out, including making use of original and/or contemporary pieces of information to the degree possible, and/or taking care to carefully delineate their presentations of fact from their opinions. As a result, the Internet has become a vast “free for all” environment in which there are many pieces of information that may not be properly attributed to sources, may present opinions as fact, and/or may be of highly questionable accuracy.
In entering into licensing agreements with various providers, curators of online library systems are often relying on those providers to employ some degree of curation, themselves, to separate out reliable information from questionable information, and to make original and/or contemporary pieces of information more readily available. In essence, curators of online library systems are seeking to make use of additional curation services performed by those providers such that the overall aggregate of the information that makes up such online library systems can be relied upon as meeting at least some minimum level of quality of accuracy and completeness. Many of such providers may, themselves, be publishers of new original pieces of information and/or new compilations of information assembled from other particular sources, and may have acquired a reputation for the accuracy and/or reliability of what they publish. Such reputations may be among the factors relied upon by curators of online library systems in selecting their providers. Again, to provide a breadth of sources, curators of online library systems may enter into licensing agreements with multiple competing providers of information. At the very least, they may seek to ensure that their users have access to an array of source that are not limited by the choices of sources made by a single provider.
Unfortunately for such curators, there is often no way to test or evaluate the quality and/or variety of information that they have arranged to be provided to their users from the providers through such licensing. Thus, such curators often have little ability to verify whether they are succeeding in their efforts to provide a sufficient variety of source of information to their users. As a result, such curators may be caused to remain oblivious to instances in which multiple providers that they have entered into licenses with are actually providing the very same information on at least a subset of subjects from the very same source. Where such a situation exists, aside from concerns that this may result in users having access to an all too limited selection of information concerning a particular subject, there is also the concern that such a single source of information may become the source of an information access bottleneck. More specifically, where each of multiple providers of information direct users to the same source of a particular piece of information, the storage, processing and/or network bandwidth capabilities of that one source may be insufficient to support accesses and queries made by so many users to that particular piece of information, thereby leading to instances where that particular piece of information may not be reliably accessible, and/or may become accessible only after a considerable period of delay, thereby impairing the overall functionality of the online library system.
In some situations, such of a situation of there being a single piece of information on a subject that becomes so widely sought after may be entirely unavoidable where the number of original sources of information on that particular subject is highly limited. By way of example, on such topics as the monitoring of geological activity along earthquake faults and/or in the vicinity of volcanoes, it may be that the US Geological Survey (USGS) of the US Government is the only source of various instrument measurements in the field. Thus, a report put out by the USGS on the subject of particular activity associated with a particular fault line or volcano may be the only source of that information, and therefore, may become the one piece of information that is included by each of multiple providers that a curator may license information from. Having knowledge of the fact there being such unavoidable bottlenecks may enable curators of online library systems to take action to improve reliability of access and/or reduce occurrences of delay in access. However, such curators are unable to take such action if they are not made aware of the existence of such situations.
Curators may also be caused to remain oblivious to instances in which one provider has made available a data set concerning a particular subject that was created as an aggregation of other data sets concerning different aspects of the subject that are individually available through one or more other providers. There may also be situations in which information on a particular subject from two different sources (and which may be available through two different providers) cover numerous different aspects of the particular subject, but may both have one particular aspect of the subject in which there is a high degree of overlap therebetween.