Questions asked of a population can be considered structured or unstructured. Structured questions have a fixed number of predefined answers. Common examples include multiple choice questions and Likert statements. Structured questions are valuable because they allow for easy quantification and comparison. For example, the responses from one population can easily be compared to a benchmark (e.g., a broader population, a different but similar population, or the same population at a different time). Comparing to a benchmark allows for interesting questions to be answered, such as whether the population is “better” or “worse” than expected, how the population has improved or declined, and more generally, how the population is different from a comparable population.
Unstructured questions are questions where people are not limited to a fixed number of predefined answers, but can type their own text response. Unstructured questions are valuable because they provide more detailed information and unexpected and important answers, including answers not completely related to the associated question.
Because structured and unstructured questions have different strengths, they are sometimes combined in a hybrid approach. There are two ways of doing this. The first hybrid approach is to ask both a structured question and an unstructured question together. For example, an employee might be asked how appreciated they feel on a scale of 1 to 10 and then be asked why they chose the number that they chose. The second hybrid approach is to use answers to an unstructured question as input to a structured question. For example, a group of people can be asked where they want to go to lunch and also to rate the options provided by others in the group.
An existing problem with the responses from unstructured questions is that they are very time consuming to read and very hard to interpret. The existing hybrid approaches help to focus attention on a smaller number of unstructured responses, but much information is lost in ignoring the other responses. This problem is especially pronounced when the number of unstructured responses is very large.
There are a number of different text mining techniques that have attempted to address this problem, but none are particularly satisfactory, especially where there is value in comparing mutually exclusive populations to each other. In a similar manner, there is a general need for assessing discrete units of text (such as articles and books), and especially for comparing different sets of discrete units of text in an efficient manner.