The analysis of survey results requires the relevant data from the survey responses be extracted and summarized in such a way that makes apparent to the analyst what issues or topics are important to the respondents, as well as the relative importance of the various topics with each other. This analysis may be done programmatically or manually by survey analysts, depending on the type of data collected and the number of responses received. A typical survey may collect both structured and free-form data in the responses. For example, an online employee satisfaction survey targeted at employees of a company may survey the employees' satisfaction with their job and work environment by having them select a numeric rating from 1 to 5 for a number of employment satisfaction factors, such as salary, benefits, training, etc. The survey may also provide a comment area where each employee can respond with any other issues or factors that affect the employee's satisfaction, both positive and negative, or provide overall comments regarding their job or work environment.
In this example survey, the structured response data consisting of the selected numeric ratings of the various factors is easily extracted from the responses and summarized, using a variety of traditional data mining technologies. The free-form text comments, however, are much more difficult to analyze and summarize because of the exceedingly broad scope of responses possible. The employee may provide either negative responses, positive responses, or both, and their comments may relate to a wide variety of internal and external employment issues, many of which may not have been conceived by the designer of the survey. In addition, different employees may use different vocabulary to describe the same issues. These factors make it difficult to quantify the responses in a way that is meaningful.
Because of the complexity involved in analyzing and summarizing free-form comments in survey response data, it is often required that the comments be reviewed manually by trained analysts. This can be a costly and time-consuming process, and an analyst's judgment on the importance of individual comments can be influenced by qualitative factors, such as how well or how poorly a comment is written. Often only a small sample of the comments are actually reviewed, which may lead to important topics related in the responses being missed or incomplete or inaccurate analysis because the sample size is not sufficient to support the results.
Few programmatic methods exist for automating the task of analyzing such free or semi-structured response data. Moreover, these methods often require the creation of a lexicon or knowledgebase corresponding to the context of the question that prompted the response before the analysis of the response data can be performed. For example, in a survey regarding consumers' satisfaction with the purchase of a camera, a lexicon for analyzing the survey response data can be created which identifies the features of the camera, such as “price,” “lens,” “battery life,” “picture quality,” “speed,” and “ease of use,” as well as words and other grammatical constructs which are used to represent a purchasers' satisfaction with a particular feature, such as “better,” “like,” “hate,” “poor,” etc. This lexicon can then be used to analyze the camera satisfaction survey responses and generally summarize the features that are liked and disliked by purchasers of the camera.
However, these methods are inadequate in analyzing and summarizing a completely free-form comment response, such as the employment satisfaction comments in the example above. In this case, developing a context may be practically impossible since the scope of possible responses is not nearly as finite as comments regarding the features of a camera.
It is with respect to these considerations and others that the disclosure made herein is presented.