The present invention is generally directed to providing a smart Exchange Database index that facilitates the searching and recovery of email data. In particular, the present invention can be employed to create a full-text index of an offline Exchange Database to provide the ability to perform complex queries to search for messages, folders, and attachments quickly.
IT administrators are oftentimes required to access archived email data. For example, a company involved in litigation or a regulatory proceeding may be required to disclose emails and/or attachments that are relevant to the litigation or proceeding. Similarly, a company may desire to access archived email data as part of an internal evaluation or investigation. In any case, it can be difficult to identify and retrieve relevant email data due to the manner in which current email solutions archive the data.
For example, Microsoft Exchange archives email data using an Exchange Database (EDB). The EDB generally comprises an .edb file and corresponding log files. The .edb file is the main repository for the email data and employs a B+ tree structure to store this data. Microsoft provides an Extensible Storage Engine (ESE) that is configured to maintain and update the EDB. Generally speaking, ESE is positioned between Exchange and the EDB and accepts requests from Exchange (via an API) to update the EDB (e.g., to update the EDB to include a new email).
Due to the format of an EDB (which is a type of indexed sequential access method (ISAM) file), it is not possible to access an EDB using complex SQL queries. Instead, the ESE provides an API through which clients (e.g., Exchange) can access the records of the EDB in a sequential manner Although the details of employing the ESE API to access an EDB are beyond the scope of the present discussion, the following simplified overview will be provided to give context for why it is difficult to search an EDB for relevant email data.
An EDB is stored as a single file and consists of one or more tables. Data is organized in records (or rows) in the table with one or more columns. One or more indexes are also defined which identify different organizations (or orderings) of the records in the table. Using the ESE API, a client (e.g., Exchange), can create a cursor that navigates the records in the database in accordance with the ordering defined by a particular index. In other words, the ESE API allows the client to position the cursor at a particular record in a table and to commence reading records sequentially beginning at that particular record.
Because the ESE API is limited to this type of sequential access of records, it can be very time consuming to search an EDB for relevant email data. For example, if a company were required to search the mailboxes of all of its employees to identify any email with a particular phrase, it would require sequentially reading every record of every table in every EDB that stores email data for any of the employees and then evaluating each retrieved email to determine whether it contains the particular phrase. Accordingly, a more efficient way to search email data that is stored in an EDB is needed.