The present disclosure relates to phrase generation.
A phrase is a group of one or more consecutive words (e.g., an n-gram) that carries a concrete and complete meaning and can function as a single syntactic unit in a sentence. An n-gram has an order according to the number of words in the n-gram, (e.g., a unigram, bi-gram, tri-gram, etc.). For example, a unigram phrase is a one word phrase, e.g., “Chicago” or “book”. A bi-gram phrase is a two word phrase, e.g., “New York” or “computer science”. Some phrases could be long, e.g., “President of the United States of America”. Phrases can be extracted from text strings having one or more words. For example, a sentence or other text string can include one or more phrases. Furthermore, a non-phrase (or a bad phrase) is a group of one or more consecutive words that is not a phrase.
Phrase extraction is typically used in natural language processing applications. For example, in a web search application, a list of commonly used phrases can be used to improve the precision of returned results, reduce latency in presenting results, and provide phrases for query expansion. However, identifying quality phrases can be difficult. For example, conventional phrase generation techniques are language dependent, for example, relying on a grammatical relationship between a given phrase candidate and other words to identify particular phrases.