A search engine typically matches a user's input query against a collection of content items (e.g., web pages, documents, ads, etc.) by comparing the tokens of the query with the tokens associated with respective candidate content items. Many times, however, the user's query and/or the content items correspond to “noisy” linguistic items having arbitrary lengths. For instance, a linguistic item can be considered noisy when it contains one or more tokens that do not contribute the expression of the main underlying meaning of the linguistic item, to any significant extent. Long queries (sometimes referred to as a tail queries) and long document summaries may be particularly prone to this problem. Due to the presence of such noise, a search engine may sometimes have difficulty interpreting the user's input query and/or the content items, and may therefore have difficulty in identifying content items that are truly relevant to the user's input query.