There is a desire in the art to provide a scalable email store for an email system that is searchable according to the IMAP4 specification (RFC 3501, RFC 5957, RFC 7162), all of which is incorporated herein by reference. A common email system design stores individual emails in individual files in a file system. Metadata regarding each email may be stored in a separate metadata database. To provide searching functionality, each email belonging to a particular user may be downloaded on a client, and a text search may be performed on the client side. Alternatively, a server side search includes using a database to identify every email that belongs to a particular user account, and then opens each individual message and searches the contents. Unless the client usage of the server is limited, a client can potentially consume large amounts of server resources when storing and searching email.
With email in particular, certain sections of the email are likely to be copied many times. For example, an email may be sent from one user to three other users. In that case, the three emails will have the same sender, subject, body, and attachment information. These fields may contain a lot of data. For example, the body and attachment may be sized anywhere from 10 MB to 50 MB of data. If a large data item must be stored each time an email is sent to a different user, then that data item will take up additional space each time it is copied.
Normalized data models refer to placing unique data items in separate tables, and having some tables reference the tables with the unique data items. Under this model, a large data item need only be stored once and other tables may reference the large data item. For example, the body of an email could be placed in a first table and referenced by a table that tracks the different recipients of the email.
The IMAP4 specification requires that an email be searchable by many different fields. Performance of a search can be sped up by indexing these fields. But an index on multiple tables is performed less quickly because a search of an index may only return the primary key of one table. Thus, if two different tables are indexed, when a search is performed for a term in a first table AND a term in a second table, the results from the index of the first table and the second table would return result records corresponding to different tables. Some form of join operation would need to be performed to determine how the records corresponding to the different tables intersect. Such operations are computationally costly.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.