Conventionally, a system to compute document similarity to a query will receive a query, identify candidate documents, retrieve candidate documents from storage (e.g., disk), and compute a similarity score for all the candidate documents. This is inefficient due to computing similarities for too many documents and/or due to the time associated with retrieving too many documents from a relatively slow storage medium. Another conventional technique involves receiving a query, making grams for query terms, and simply counting the number of grams in the query that match grams in a data store. While more efficient than the first system, the gram-counting approach provided low precision and thus may have missed identifying documents having a high relevance to the query and may have provided documents having a low relevance to the query.