The present invention relates generally to social networking systems and more particularly to optimizing the storage architecture of real time content generated on a social networking system for efficient search and retrieval.
Social networking systems provide users with multiple mechanisms to post differing types of content, including text, links, photos, videos, and comments on other users' posts, just to name a few. As a social networking system grows to hundreds of millions of users, the amount of content being stored grows exponentially. Storing the content in primary storage (e.g., memory) yields the fastest retrieval, but primary storage is expensive. Thus, content is eventually stored in secondary storage (e.g., hard disk) which is less expensive, but results in longer access times. Determining which content should be stored in primary storage to enable real time searching is difficult because some content may be accessed frequently while others content is accessed only sporadically.
Conventional document indices for large scale (e.g., web) systems typically ignore the user as a structural indexing attribute. A typical inverted index stores a list of documents for a given term, where the list of document is ordered by document identifier. The user, or more typically the “author” of the document, is simply one of many keys/attributes that are stored with the metadata for the document, but the structure of the index is not organized in memory with respect to the author. In addition, conventional indices typically capture the creation date of when a document was generated as another attribute of the document. For example, a document, or content, that a user authored, or posted, a week ago is conventionally stored and retrieved in the same manner as content posted in the last hour. Users may wish to search the most recently posted content of other users on a social networking system before the content posted a week ago. However, terms may be repeated by users posting content, leading to an inefficient allocation of memory and future fragmentation of computer-readable storage media. Managing a pointer to a single object representing the commonly repeated terms leads to wasteful overhead processing. Additionally, management of old databases becomes complicated, leading to broken links. Thus, conventional search indices are not optimized for real time searching.
Additionally, users of social networking systems may wish to search the content of other users with which they are connected to on the social networking system before searching content of random users of the social networking system. Social networking systems also gather information on the interactions between users to identify stronger connections between users. Conventional social networking systems do not optimize indices to enable ranking of search results according to the strength of connections between users.