Recent years have seen a significant increase in the use of computing devices in creating, performing, and analyzing free-form opinion text responses (e.g., survey responses, user feedback, and online reviews). Indeed, it is common for an individual to fill out a survey response using a computing device for a product or service offered by a business, person, school, or other entity. In many cases, a respondent provides a text response when expressing their opinion. For example, a respondent uses a mobile device to write a few sentences as part of a survey, social media post, or electronic message.
While conventional text analysis systems have attempted to extract feedback information from text responses, the conventional systems struggle to accurately extract opinions from text responses. Indeed, conventional systems have employed the same feedback extraction techniques and methods for over a decade. For example, most conventional systems employ application-level natural language processing to extract opinions from text responses. In particular, many conventional systems employ machine-learning algorithms in connection with natural language processing for opinion mining. However, these conventional methods present several disadvantages, such as missing rich opinion insights that may be valuable to businesses, entities, and other entities.
One disadvantage, for instance, is that conventional systems require large datasets to sufficiently train a machine-learning algorithm. For example, with English text responses, conventional systems need to learn proper context and meaning for about 200,000 unique words. To achieve this learning, a conventional system needs to learn tens of thousands of sentences. Also, if the training dataset is not adequately large, the machine-learning algorithm will produce poor and inaccurate results.
As the size of a training dataset increases, additional problems arise. For example, a larger training dataset requires additional processing resources to obtain a learned parameter space as well as additional memory and storage to store the training dataset and the learned space. Indeed, the learned space is often too large to store on many client devices. Moreover, a larger training dataset requires additional rules to learn, which in turn, increases the complexity and processing time of executing the machine-learning algorithm in the future. Further, because conventional systems commonly learn using supervised training data, a human must manually tag, label, and/or annotate each sentence in the training dataset. The cost of labeling potentially hundreds of thousands of items in large training dataset can be exorbitant and prohibitive.
As another problem, training datasets used by conventional systems are often domain-specific. For instance, a conventional system attempting to analyze text responses with respect to customer service must employ a training dataset that includes words, terms, and phrases related to customer service. If the conventional system is mining opinions from specific or uncommon domains, a sufficiently large training dataset may not exist or be available, which prevents the conventional system from accurately training a machine-learning algorithm.
In addition, conventional opinion mining systems often experience out-of-vocabulary issues where the trained machine-learning algorithm encounters data not included in the training dataset. Thus, despite the large training dataset, the increased processing resources, and the additional memory, conventional systems still produce inaccurate results. Accordingly, these and other problems exist with regard to conventional systems and methods for opinion mining from free-form text responses.