A social networking system may allow members to declare information about themselves, such as their professional qualifications or skills. In addition to information the members declare about themselves, a social networking system may gather and track information pertaining to behaviors of members with respect to the social networking system and social networks of members of the social networking system. Analyzing a vast array of such information may help to come up with solutions to various problems that may not otherwise have clear solutions. Traditionally search engine systems allow a small number of keywords to be used for retrieval of the relevant documents, such as member profiles including the information declared by members of a social networking system.
Systems for formulating queries for information retrieval and search engine systems have been developed. These systems, implemented for search purposes, may access data stored in databases or other storage and retrieval systems. Some of these systems may return records which contain all or most of the query keywords. Some systems extended the approach of keyword search by mapping query keywords to matching predicates or ordering clauses, and casting the query to SQL or similar queries to be processed by a database system. Attempts have been made to optimize queries prior to execution in order to speed information extraction from databases. Some systems use a set of predefined rules to rewrite and seek a theoretical optimization of the query prior to execution in the database environment. Genetic algorithms have also been used to find Boolean query alterations to increase speed in execution in information retrieval systems. Some systems construct query modifications in a web search domain using corpus-based support vector machine (SVM) models. Non-linear SVM models have been used to classify documents and produce query modifications based on SVM output. Some systems incorporate Boolean models using an approximate Markov blanket feature selection technique to obtain the set of minimal terms and a decision tree to build the corresponding Boolean query.
Query optimization systems for long text queries have also been incorporated into information retrieval systems. Some of these systems use query reduction and query term weighting techniques. The query term weighting techniques may assign importance weight to terms within a given query. Some systems use a query term weighting approach based on the weighted dependence model for web queries. Query reduction techniques select a subset of the query to be run against a search index. Some systems using query reduction techniques may rank subqueries of the query. Some other systems for query reduction consider subqueries that differ from an original query by a single term. Conditional random field (CRF) models have been used to select subqueries of a given original query. Query reduction and query substitution do not resolve problems of real time performance limitations where a number of keywords in a content based recommendation remains unwieldy.
Some systems incorporate query expansion or substitution techniques. In these systems, parts of the query may be replaced or extended. Some systems use query substitutions where the new query is related to the original query and contains terms closely related to the original query. These systems have incorporated machine learning models for selecting between query candidates by using a number of features relating to a query/candidate pair.
Some systems attempt to block candidate selection over large datasets. These systems construct blocking keys to group entities in an offline map reduce system. Some systems construct blocking functions based on sets of blocking predicates. These systems may formulate a problem of learning a blocking function as a task of finding a combination of blocking predicates. These systems do not extend to subsets of candidate selection for recommendation systems due to high computation costs associated with enumeration of large quantities of candidate clauses in a query.