Field of the Invention
The present invention generally relates to natural language processing, and more particularly to a method of monitoring and analyzing the communications of influencers in a field to identify potential product trends.
Description of the Related Art
Predictive modeling is a well-established methodology across a diverse problems space. For example, impending mechanical failure of a complex system such as a diesel generator can be predicted by application of failure models to performance data streaming in real time. Collaborative and cooperative filters enable recommendation of consumer products to users based on retailer knowledge of prior consumer spending.
Other events are less conducive to prediction. For example, there is a cottage industry to predict commercial success of a new song, movie or book. While there is no doubt that previous commercial success of an artist suggests a ready audience for subsequent work (a new John Grisham novel often debuts as #1 on the New York Times bestseller list), much of this prediction is left to intuition of people with enormous depth in the industry. This approach is hardly scientific.
Other trends or fads can be detected on social media. A trending Twitter subject can be an indicator of enormous commercial success in the near term (“OMG, you simply MUST try x”). This source, however, is not useful in prediction or preparation; once the subject is trending on Twitter or Instagram the popularity wave is already cresting.
Although there is ample discussion of social media analytics, there is a very limited body of work on the use of media communications to predict medium term trend setting in the market place. The article “Fashion Supply Chains and Social Media: Examining the Potential of Data Analysis of Social-Media Texts for Decision Making-Processes in Fashion Supply Chains” by Beheshti-Kashi et al. considers the impact of blog posts on color choices by retail store buyers and customers. Their work suggests a strong relationship between blog posts and choices made by buyers for retail stores. Interestingly, their work also finds that blog information corresponds with real world customer demand. H. J. Fisher, in a master's thesis entitled “Food stylists' food image creation for print media and consumer interpretation: an exploratory investigation”, investigated the psychometric connections between choices made by professional food stylists and consumer food choice. He finds that food stylists, through non-verbal communication, can create images in print media that have the impact of altering consumer behavioral intent and eventual purchasing decisions. The paper “Social Media Competitive Analysis and text mining: A case study in the pizza industry” by He et al. (focused mainly on retail-level social media like Facebook) notes that more than half the people responding to a consumer survey by Market Force Information chose food options by reviewing online comments and reviews.
As analysis of social media becomes more complex, it is increasingly important to have a set of tools that provide a more intuitive understanding of user communications. As part of this effort, many systems employ some form of natural language processing. Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation allowing computers to respond in a manner familiar to a user. For example, a non-technical person may input a natural language question to a computer system, and the system intelligence can provide a natural language answer which the user can hopefully understand. Examples of an advanced computer systems that use natural language processing include virtual assistants, Internet search engines, and deep question answering systems such as the Watson™ cognitive technology marketed by International Business Machines Corp.
Deep question answering systems can identify passages from text documents (corpora) and analyze them in various ways in order to extract answers relevant to a query; answers can be scored on a number of factors, and the highest score indicates the “best” answer. Models for scoring and ranking the answer are trained on the basis of large sets of question and answer pairs.
One method of analyzing a natural language sentence is to construct a parse tree for the sentence. As the name suggests, a parse tree is a tree-like construct having branches and nodes (including a root node, interior or non-terminal nodes, and leaf or terminal nodes) whose arrangement and elements reflect the syntax of the input language. Syntax generally pertains to rules that govern the structure of sentences, particularly word order. Syntax is one set of rules that make up the grammar of a language. Grammar includes additional rules such as morphology and phonology. Syntax can help define relations between words in a statement, such as a noun being associated with an adjective or a prepositional phrase.
One aid in NLP involves the use of syntactic n-grams. An n-gram is a sequence of n items from text or speech (two items is a bi-gram, three items a tri-gram, etc.). Syntactic n-grams are n-grams defined by paths in syntactic dependency or constituent trees rather than the linear structure of the text. The paper “Syntactic N-grams as Machine Learning Features for Natural Language Processing” by Sidorov et al. promotes the use of syntactic n-grams over regular n-grams. Words in the n-gram are determined by syntactic relations in a parse tree rather than physical word order lifted directly from the text. This preserves “real” relations between words in a sentence, and lifts arbitrary constraints imposed by surface sentence structure.