The disclosure herein relates generally to identifying related documents. In file systems and other types of electronic document storage systems, users often spend a lot of time curating their documents. One example of a way in which users curate documents is by organizing the documents into collections. For example, a user may place a group of related documents in a folder, and give the folder a meaningful title that describes the documents.
Previous document storage systems have implemented many different types of collections. Examples of collections include folder based collections, tag based collections, and keyword based collections. In some document storage systems, a document can be a part of only one collection, while in other document storage systems, a document can be a part of multiple collections. A common goal of many types of collections, however, is to group related documents.
Some previous efforts have attempted to automatically group related documents based on words or phrases that appear in the documents or in their titles. Although the presence of common words or phrases in a group of documents may provide some insight as to whether those documents are related, false positive results may be generated in situations where a user's documents include unrelated documents that are directed to similar subject matter, and related documents may not always include common keywords.