Many types of computer software application systems have been developed that operate using sets of related documents or files. Such application related documents may be stored, accessed, conveyed, and/or otherwise processed by an associated application, using either an application specific document database, or using a database shared with other applications. As it is generally known, such document databases may contain documents including any specific form of data including text, images, sound, video, and/or any other specific data type.
For any set of documents, in order to improve performance of operations such as searches, sorts, and others, it is often useful to create and maintain a “search index” data structure. For example, a search index enables efficiently matching tokens within a search query to documents containing those tokens. For the contents of a document to be represented in a search index, the document must go through an “indexing” step, resulting in information describing the document contents being added to the index.
Unfortunately, indexing large numbers of documents is expensive both in terms of CPU utilization and in the size of the search index. For each document indexed, multiple processing steps may be required, such as conversion from a document markup format to a plain text format, language detection, tokenization, and insertion into the index. These actions may consume significant processor and storage resources.
In multi-application execution environments, such as those referred to as “on-demand” application environments, individual applications may operate independently, while sharing underlying platform resources with other applications. Moreover, each application may communicate with one or more other applications. For example, inter-application communication may be provided between an electronic mail (“email”) application and a content management (“CM”) application, through which an email attachment document may be moved from the email application to a document repository under the control of the CM application. During such operations, in which a document is moved from one application to another, existing application platforms have typically re-indexed the document being moved. The document may accordingly be indexed once for use by the email application, and then again for the CM application. This is disadvantageous, resulting in identical content being re-indexed for use in two different application contexts. It would be desirable to eliminate such unnecessary processing and resource consumption to improve the performance of a platform level indexing service.
In some existing systems, multiple applications may each have their own data store and associated search index. Content sharing between such independent databases may not be possible. In other systems, multiple applications or content sources may employ a single search index. However, each application is still required to maintain a distinct set of documents within the shared search index, irrespective of whether identical documents are stored multiple times by multiple applications. In either case, significant improvements in performance would result from reducing or eliminating the indexing of documents multiple times for use by different applications or application instances.