Service providers (e.g., wireless, cellular, etc.) and device manufacturers are continually challenged to deliver value and convenience to consumers by, for example, providing compelling network services. These services are leading to vast amounts of data (structured and binary) which need to be managed, stored, searched, analyzed, etc. Over the last decade, the internet services have accumulated data in the range of exabytes (1016 bytes). Although most of this data is not structured in nature, however, it must be stored, searched and analyzed appropriately before any real time information can be drawn from it for providing services to the users.
Social networking services provide various interactions among communities of users (e.g., family, friends, colleagues, classmates, etc.) Social based services drive a lot of data into the network system. For example, the network system providing social networking services needs to capture every comment and post by a friend or by any other user connected to the user via the social network. This leads to petabytes (1015 bytes) of data even for a social network with only a few million users. Most of the search engines, such as for example Lucene® are geared to search on certain small amounts of data. However, when encountered with massive amounts of data, search engines such as Lucene do not scale.
In order to provide a scalable search indexing based infrastructure, data partitioning strategies are used. Examples of common partitioning strategies used by industries include key based partitioning, location based partitioning, etc. The location based partitioning is based on the fact that user location can be easily determined and can be used to find related content that has been pre-partitioned based on location. However, location based partitioning is not an efficient strategy in social networking systems. In order to scale in such systems, social graph based partitioning is a highly efficient mechanism. In social graph based partitioning, all use cases involving search of family and friends can be associated with data spaces which are closely aligned with the social graph of an individual. Furthermore, a predictive social graph provides data clustering methods to cluster data associated with the users of a social network according to their existing and possible future affiliations, interests, etc.