In a modern computerized enterprise environment, workers or other users require content from a wide variety of both public and internal resources. In particular, many tasks require access to information contained in content on internal resources, like an intranet, or external resources such as publicly available resources like Wikipedia. A worker today is thus required to remember, or document, the location and type of content in each of the resources to which they have access. To accomplish a task requiring multiple pieces of content from different resources, a worker must locate all the resources containing the content and search all content in each resource to find the information they need. In many cases, the worker cannot complete a task if they cannot locate the content containing the required information and then successfully extract the information from the resource.
As an example, a customer support ticket might be created in a web based support ticket system, and the account details might be stored in a network based file share and a customer relationship management system. Processing the ticket and responding to a customer requires a worker to know the location of all three resources, and how to search and extract the information from each computer system.
This problem is further compounded by the number of resources in a modern enterprise and the amount of information in each system. Most data in a resource is irrelevant to the worker's tasks or jobs and makes the task of finding their required information more complicated. In the previous example, a customer support representative may need to sift through information on a network based file share that is only relevant to a sales team. Workers must find thousands of pieces of content in dozens of repositories in a very time consuming process that often ends up with the worker unable to locate desired content and having to re-create pre-existing content.
Currently then, a worker in an enterprise cannot easily locate all the content they may need from multiple internal or external resources to complete their tasks. These problems are symptomatic of a larger problem that almost all users within enterprise computing environments encounter. Namely, while there are a number of solutions that provide search capabilities to a user, these search systems are inadequate for a variety of reasons. In particular, these search systems may index some accessible content into a search index that the user can search across the given systems. However, the existing solutions that create such indices only serve to complicate the problems associated with the search content across distributed resources because they do not comprehensively cover these resources, or all relevant content on the resource, and thus add yet another resource to an ever-growing list of information locations any given worker must search. This situation exists at least in part because a user may be under the misimpression that such a search system covers all content in all pertinent resources and may not seek out other resources or content.
In addition, these search solutions have other common weaknesses. Specifically, in most cases, these search systems require direct access to the resource to index content of those resources and they index all content on the resource by crawling and processing the content.
The requirement of integrating search systems with resources so that these search systems have direct access to those resources creates many challenges. To create an index of content for search, the search systems require either authentication and authorization to use an interface that is an offered by the target system on which the content reside, or they require authentication and authorization for a “crawler” that visits content that can be accessed through the resource to create an index of that content for search. This type of configuration is beyond the skill set of most information technology (IT) teams.
In addition, only content from computer systems that have these types of integrations at all can be searched. If a resource does not offer such interfaces, the content for that resource may be inaccessible by search systems and thus may not be indexed and available for searching. Moreover, integrating with one resource does not typically reduce the amount of work required to integrate with any other resource. As the average company uses hundreds if not thousands of resources, this limitation is significant.
A search index built by “crawling” content is also limited. At least one reason for the limited usefulness of such indices is that an index built by processing the content is based upon the contents (e.g., the data or information) of the indexed content. Thus, a search system that utilizes an index of this type is confined to determining importance of content to users based on the contents of the indexed content. In particular, these search systems may be keyword based where the importance of content is determined based on the frequency of occurrence of search terms in the contents of the content. As may be imagined, this measure of content importance results in many irrelevant documents being returned as responsive to a search.
As an illustration, suppose a company has a product that it is has been producing for many years under the same name. A search for content based on the name of that product may result in many older documents containing the name of the product being returned in response to the search, as the measure of importance of the content is based on the index created from the contents of the content and the search is applied to all indexed content. This content may, however, be highly irrelevant to the user. The user must himself filter the search results based on the importance of the returned documents to him.
While certain search systems may utilize a variety of techniques to mitigate the effect of a content based search index, these techniques have proven inadequate in addressing the base problems inherent in utilizing such a content-based index for search systems. Specifically, these techniques fail to ameliorate the problem that the contents of a document are a poor proxy for the importance of that document to a user. The problems inherent in such search systems are also exacerbated by the fact that the measure of importance of content is determined in the same manner (based on the contents based index) regardless of the user.
Accordingly, current search systems are difficult to implement and deploy, are capable of indexing and searching only a small fraction of the resources available to an enterprise and, even amongst that subset of resources, often provide highly irrelevant results to a user. What is desired, then, are improved search systems.