Relational databases provide substantial advantages when it comes storing and managing structured data. Unfortunately, database design techniques that are aimed at reducing data redundancy and enforcing data normalization rules typically do not support full-text indexing and querying of text documents as do modern search engines. When it comes to searching within structured data relational databases can impose significant constraints on a user's ability to query. Queries performed on a relational database can be exceedingly complex and frequently are beyond the skillset of the novice or untrained user. Relational databases also lack the simplicity of the one line search interface to which users of web search engines have become accustomed.
For example, the World Wide Web can provide access to a vast amount of information, and specialized search tools, known as “search engines” (e.g., Google, Yahoo, and MSN Search) have achieved great success in facilitating searching of static text documents. Conventional web-based search engines, however, are not designed for use in an enterprise environment because data can be stored in many different forms, using various localized repositories and databases. While a data repository on the Internet or an intranet may contain record-based data relevant to a search query, the search engine may not be capable of indexing and/or accessing the data. A similar problem may be encountered with other forms of content such as word-processing documents, graphical or image files, MP3 clips, interactive blogs, and other data that may change in real time.
Conventional methods of executing a query referencing multiple tables in a search engine tend to fall into one of two categories: (i) denormalization, in which the joined tables must be combined at index time, or (ii) subdivision, where the query is divided into two or more table queries, which are processed independently, and the results combined in a post-processing phase. Denormalization has several drawbacks, primarily the increase in the size of the index, because tables with multiple foreign keys can expand by orders of magnitude after denormalization. The post-processing approach involves extracting a large volume of data from the index (typically the entire contents of one or more tables) and then winnowing the data down based on the join constraints. This is also an inefficient use of resources.
U.S. Pat. No. 8,073,840, assigned to the assignee of the present application, titled “Querying joined data within a search engine index,” and which is incorporated herein by reference in its entirety, provides techniques and systems for using a search engine interface to index and retrieve data and documents stored in a relational database management system (RDBMS).