As it is generally known, search performance can be improved by maintaining a full text index representing the contents of various types of content sources. However, maintaining an up to date full text search index can require significant disk space and processor resources. While some existing techniques have proven effective for improving the efficiency of full text indexing, they have significant limitations.
With regard to disk space efficiency, existing systems have reduced index size by using efficient data storage structures. This technique is limited by the fact that over-compression of data structures may negatively impact the performance of user queries.
With regard to processor utilization, some performance problems can be alleviated by asynchronously indexing content when the computer is inactive. While asynchronous content indexing during periods of inactivity can be effective in certain execution environments, such as an end user's desktop system in which the central processing unit (CPU) goes unused for many hours in the evening, it isn't applicable in all cases. For example, in the case of portable devices, such as laptop computers and personal digital assistants (PDAs), the device typically conserves power by entering a hibernated or shutdown state during periods of user inactivity, thus limiting the processor resources available for asynchronous indexing. In the case of server systems, or clusters of server systems, on-demand service environments require that services be actively provided at all times. While adding more server systems or resources can increase overall resource availability, the resulting additional costs are undesirable and may be prohibitive.
Several performance issues arise with regard to managing full text indexing capabilities in existing systems. These performance issues require careful system configuration and tuning, to avoid situations in which costs, in terms of disk and/or processor resource consumption, outweigh the benefits provided to the user. For example, at least one existing electronic mail (“email”) system allows an administrator user to selectively enable full text indexing for individual public folders and mailbox stores. This can help reduce the amount of content processed by allowing the administrator to manually select which content is represented in the full text index. Full text indexing processing levels may also be set, with lower settings require full text indexing to use of less processor resources. Such processor limitations may potentially result in the index representing content that is not current, since the indexing service may have trouble keeping up with the generation of new content to be indexed. While existing techniques alleviate some performance problems of full text indexing, the improvements come at the expense of the user's search experience, since the index may end up representing content that is not current, and/or fail to represent important content.
For the above reasons, it would be desirable to have a new system for providing a full text index. In view of the inherent costs of full text indexing, the new system should advantageously include a top down design paradigm providing improved insight into what content is most desirable to represent in the full text index. The new system should effectively reduce the net costs of maintaining a useful full text index without negatively impacting a user's experience.