This specification generally relates to indexing and searching of documents, access to which are regulated by respective access control lists (ACLs).
With collaborative documents and social networks, an increasing amount of content is stored with ACLs that specify a set of people who have access to the documents. Searching over such a corpus of documents presents certain challenges. For example, documents that one user sees may be different than documents that another user sees. This problem can be solved by adding ACL tokens to documents, each ACL token representing a user having permission to access the corresponding document. A problem with this approach, however, is that search systems must perform intersections of large hit lists, which is particularly problematic in disk-based indexing solutions. A solution for disk-based indexing systems is to write separate copies of a document to each person that has permission to them. This is referred to as write fan-out. Although this improves the efficiency of searches, the size of the index and the document write rate are greatly increased. An alternate solution includes writing a single copy of each document with ACL tokens into a sub-index (partition) corresponding to each document owner, and merging results from each collaborator of a user at search time. This is referred to as read fan-in. Although this improves efficiency of document writes, searches can end up merging a large number of result sets when users have many collaborators.