Embodiments of the present invention relate to systems and methods for generating query results, as well as to information management systems for use with heterogeneous enterprise and other environments that may include relational database structured data and unstructured data stored in document images. Embodiments also relate to methods of searching secure data repositories that contain documents or other data items belonging to numerous heterogeneous enterprise environments, as well as methods of changing a customer-specified setting to rank results in a query in an enterprise search system able to crawl heterogeneous enterprise content.
Typically in an enterprise relational database, query generators are used to construct database queries which are then sent to a database for execution. A user constructs a query by an approach such as selecting items from a drop down list of items displayed on an interface. The items may represent data or documents which are to be obtained from a database or using a URL, or alternatively the items may represent operations that are to be performed on the data. Once the items have been selected, the query generator then generates a query, usually in Structured Query Language (SQL), for execution by the database.
An end user in an enterprise environment frequently searches huge databases. Information retrieval systems in such environments are traditionally judged by their precision and recall. Large databases of documents, such as the World Wide Web, contain many low quality documents. As a result, searches across these databases typically return hundreds of irrelevant or unwanted documents which camouflage the few relevant ones that meet the personalized needs of an end user. In order to improve the selectivity of the results, common techniques allow an end user to modify the search, or to provide additional search terms. These techniques are most effective in cases where the database is homogeneous and already classified into subsets, or in cases where the user is searching for well known and specific information. In other cases, however, these techniques are often not effective.
A typical enterprise has a large number of sources of data and many different types of data. In addition, some data may be connected to proprietary data networks, while other data sources may be connected to, and accessible from, public data networks, such as the Internet. More particularly, information within a single enterprise can be spread across Web pages, databases, mail servers or other collaboration software, document repositories, file servers, and desktops. As the number of documents accessible via an enterprise intranet or the Internet grows, the number of documents that match a particular query becomes unmanageable. Previous approaches for prioritizing searches have involved keyword priorities and pairs of keywords leading to some improvement although not every document matching the query is likely to be equally important from the user's perspective. A user may still be overwhelmed by an enormous number of documents returned by a search engine, unless the documents are ordered based on their relevance to the user's specific query and not merely limited to keywords or pairing of keywords. Another problem is that differing deployments in a heterogeneous enterprise environment may want to emphasize different document attributes, creating a difficult task for a user attempting to return results from such a document. Often, the results of such a search will be that the desired document hit is at the end of several pages of results.
One way to order documents is to create a page rank algorithm. Many search engines also provide a relevance ranking, which is a relative numerical estimate of the statistical likelihood that the material at a given URL will be of interest in comparison to other documents. Relevance rankings are often based on the number of times a keyword or search phrase appears in a document, its placement in the document and the size of the document. However, in the context of differing attributes for the same document in a heterogeneous enterprise environment, such relevance ranking tools do not offer an end user the desired level of configurability and customization.
Ranking functions that rank documents according to their relevance to a given search query are known, and while useful in some settings, do not allow a consistent user in a heterogeneous enterprise environment to personalize ranking results based on an end user set of preferences, either globally or for a single instance. Therefore, efforts continue in the art to develop ranking functions that provide better search results for a given search query compared to search results generated by search engines using known ranking functions. The ability to allow an enterprise end user to change ranking functions to customize the ranking of query results returned in heterogeneous enterprise environment to return personalized rankings of content for a single instance within the enterprise has remained unsolved.
Therefore it is desirable to provide a simple, intuitive, and heuristic method to allow an end user to change ranking algorithms to meet global or single instance requirements in a heterogeneous enterprise environment query, as well as to allow end users to rank search results in heterogeneous enterprise environments. It is desirable to provide a system that overcomes the above and other problems.