The rise of the Internet has occasioned two disparate phenomena: the increase in the presence of social networks, with their corresponding member profiles visible to large numbers of people, and the increase in the use of these social networks to perform searches for jobs that have been posted on or linked to by the social networks.
Various preprocessing steps commonly performed on job search queries in social networking services rely heavily on low-level natural language processing such as tokenization, normalization, and the like, which are relatively mature for popular languages such as English, French, Spanish, and so forth. However, for retrieval and ranking in other languages, as well as cross-language retrieval (i.e., retrieval across multiple languages with one search), erroneous low-level natural language processing operations prohibit query expansions and rewriting, retrieval, and ranking processes fall back to basic keyword-based similarity measures. What is needed is a way to obviate the need for advanced low-level natural language processing operations in order to improve cross-language retrieval.
Additionally, the tokenization and tagging performed by a query tagger and rewriter can be error prone, which propagates the errors to jobs retrieval and ranking phases. For example, consider the queries “software engineering manager” and “manager software engineering.” The former query is tagged as a title on the whole query, while the latter has “manager” tagged as a title and “software engineering” tagged as a skill. The impact of difference in tagging is a difference in query construction, with the former query retrieving with an emphasis on title only and the latter query including jobs that match both the skill and the title. Moreover, this difference in tagging propagates to a ranking phase, where tags for titles and skills contribute differently to the ranking algorithm. What is needed is a way to reduce or eliminate this retrieval degradation due to query preprocessing errors.
Furthermore, current job search mechanisms expand “important” tokens in queries with their synonyms using a pre-defined list of similar keywords. This manual step, however, is not scalable to different domains and locales. What is needed is a way to represent the query in a way that obviates the need for this step.