Surveys can provide several benefits, including enabling a company to better understand its workforce. Survey items can address issues such as engagement, organizational health, and satisfaction. Employee feedback can help an employer diagnose problems and find new opportunities for improvement.
When responses to a survey correspond to an ordinal scale, there can be challenges in accurately summarizing those responses. Responses according to an ordinal scale are responses that can be ranked in an order and thereby sorted. Likert scale responses (e.g., Strongly-Disagree/Disagree/Neutral/Agree/Strongly-Agree, or Bad/Needs-Improvement/Good/Excellent) are an example of ordinal scale responses.
There are two common methods for summarizing groups of responses to an ordinal scale so that the groups of responses can be compared. In the percent positive approach, scale options are divided into “positive” and “not positive” groups. The summary of a group of responses is the percentage of the responses that are positive.
Another method is the integer assignment approach. In this approach, increasing successive integers are assigned to each of the scale options. The summary of a group of responses is then the arithmetic mean, geometric mean, or median of the assigned integers.
Further, there are many methods for comparing a group score (e.g., a score for a company) to a benchmark score. In the subtraction method, the benchmark score is subtracted from the group score, and the result, which can be positive or negative, is the distance from the benchmark. For example, a group might be 5 percentage points more positive than the benchmark. In the ranks method, if the benchmark data can be divided into groups that are similar to the group being compared to the benchmark, then the groups can be ordered and a rank for the group calculated. For example, a group might be 3rd out of 30. The percentile method is the same as the ranks method, but instead of reporting 3rd out of 30, it is reported as the 90th percentile. The z-score method is built on dividing the benchmark into groups similar to the group being compared to the benchmark. In this method, the standard deviation for the set of benchmark scores is calculated. Then, the subtraction method described above is used. But instead of reporting the difference, the difference is divided by the benchmark's standard deviation resulting in a number known as a z-score, which is essentially the number of standard deviations the group score is from the benchmark average.
Percent Positive Approach
There are shortcomings with the above approaches. For example, with the percent positive approach, the positive response options (agree, strongly agree) can be grouped together and the score reported can be the percentage of all responders who responded positively. These percentages can then be compared to percentages from similar companies. For example, a company may receive a score of 85% positive on an item about appreciation. The fact that a group of similar companies averaged 73% positive on the appreciation item would lead the company to conclude that they were doing well at appreciating their employees.
The fundamental drawback to this approach is information loss. By collapsing multiple response options (typically five or seven) into two (positive or not positive), information is lost. Specifically, enthusiasm level is lost. No distinction is made between a positive person and an enthusiastic person. Similarly, no distinction is made between a neutral person and an angry person. Information loss can be acceptable when there is no benchmark data available or only poor benchmark data available. With good benchmark data available, however, the loss of information can interfere with making better decisions about how to take cost-effective action.
Integer Assignment Approach
As another example, the integer assignment approach discussed above also has shortcomings. Table 1 below shows an example mapping of a seven-category Likert-type scale to the integers zero through six. Scores are then reported by converting each category on the response scale to an integer using the mapping and then calculating the arithmetic mean of the integers.
TABLE 1CategoryIntegerStrongly disagree0Disagree1Slightly disagree2Neutral3Slightly agree4Agree5Strongly agree6
These averages can then be compared to averages from similar companies. For example, a company might get a score of 4.38 on an item about appreciation. The fact that a group of similar companies averaged 3.97 on the appreciation item would lead the company to conclude that they were doing well at appreciating their employees.
The integer assignment approach (sometimes referred to as the average score approach) improves on the percent positive approach in that each response option is treated differently, and thus information is not lost. But this approach suffers from the same lack of calibration drawback as the percent positive approach, and it introduces the new drawback of assuming that the response options are all equidistant from each other.
Every survey item has its own unique response characteristics. The most obvious characteristic that varies by item is how easy the item is to agree with. For example, across a broad population, an item about pay will always receive a lower average than an item about ethics. This is because employees generally have a more negative view of their pay than they do of their companies' ethics. These differences can be adjusted for by comparing the average score to a benchmark value.
In addition to different expected values, however, each item also has a different distribution or spread of responses. Put another way, different items have different degrees of expected variation in the responses. Because of this, it can be difficult to compare an average score to a benchmark and know whether the difference is large or inconsequential. The same difference can be large given the distribution of one survey item, and inconsequential given the distribution of another survey item.
For example, consider two survey items: an item about whether an employee thinks the company operates by strong ethics (ethics) and an item about how much negativity there is at their company (negativity). The negativity item will typically elicit more enthusiasm in both directions with far more people responding both strongly negative and strongly positive. Thus, the same size difference from the benchmark is far more important for the ethics item than it is for the negativity item. Put another way, a strongly negative response to the ethics item is rarer and more worrying than a strongly negative response to the negativity item.
Lack of calibration has varying levels of impact based on what kinds of scores are being looked at. Most obviously, this drawback is a big concern when looking at a single population and comparing how they scored on different items. Less obviously, this drawback is also a concern when comparing how different populations, like departments or locations, scored on a set of items. The issue here is that a problem on a narrowly distributed item can be masked by superficially good scores on items with wider distributions.
To understand this drawback, it's important to understand the different variable types. There are four kinds of variables: categorical (or nominal) variables, ordinal variables, interval variables, and ratio variables. A categorical variable is a variable in which the potential values do not have any intrinsic order. An example categorical variable is eye color. It is possible to say that one eye color is equal or not equal to another eye color, but it is not possible to say whether one eye color is greater than or less than another eye color.
An ordinal variable is a variable in which the potential values for the variable do have a clear agreed upon ordering, but no clear consensus on the relative distances between the values. An example of an ordinal variable is the job grades of team member, manager, and senior manager. It can be agreed that in terms of the hierarchy of the organization, a team member is at the lowest level, a senior manager is at the highest level, and a manager is in between. It's impossible to say, however, whether the gap between team member and manager is larger or the gap between manager and senior manager is larger, and even if one could, it would be impossible to say by how much. One characteristic of ordinal variables is that adding them or taking the arithmetic mean does not make sense. A team member cannot be “added to” a senior manager to get the result of two managers.
An interval variable is a variable where the potential values have both a clear agreed upon ordering and clear agreement about the distance between the values. An example of an interval variable is number of hours worked per week. In this case, adding or taking the arithmetic mean of two variable values makes sense. If one person worked 20 hours in a week and another person worked 40 hours in a week, then we could reasonably say that the two people worked a total of 60 hours. Or, if together they made up a small department, we could say that on average, people in that department work 30 hours per week.
A ratio variable is a variable that includes all the characteristics of an interval variable, with the additional condition that it includes a meaningful zero, in which zero indicates that there is none of that variable. For example, number of hours worked per week has a clear order and distance between values, as with an interval variable. It is a ratio variable because a value of zero indicates that the individual did not work any hours that week.
A response on a Likert-type scale to a survey item is an ordinal variable. For example, it is clear that being strongly negative is worse than being negative, which is worse than being slightly negative. But a response on a Likert-type scale to a survey item is not an interval variable. For example, there is no definitive way to say that the distance between strongly disagree and disagree is the same as the distance from disagree to slightly disagree. Or, applying the arithmetic mean test, if you have a two-person department with one person who strongly agrees with a statement and one person who strongly disagrees with the statement, it's nonsensical to say that on average you have neutral people. On some level, having two people who passionately disagree with one another is exactly the opposite of having two neutral people.
The integer assignment approach, however, forces inherently ordinal responses on a Likert-type scale into being an interval variable. While this approach is simple, it is mathematically unfounded and introduces error into the scores. The equidistant point assumption drawback can be acceptable when looking at changes in scores from one time period to the next across larger populations in situations where no external benchmark data or only poor external benchmark data is available. In these cases, the noise will often balance out, and avoiding the information loss inherent in the percent positive approach is valuable. But in other cases, this approach can lead to unreliable results.
Z-Score Approach
The z-score approach is the least common and most sophisticated of the three common approaches for calculating survey scores. It can start with the same calculation used in the integer assignment approach. Next, the distance between the company score and the benchmark score is calculated by subtraction, which is also the same as the integer assignment approach. But in a third step, instead of reporting the calculated difference, the standard deviation of the scores that went into the benchmark average is calculated and the difference between the company score and benchmark score is divided by the standard deviation.
This approach improves on the average score approach by providing some calibration for the different response distributions of different survey items. It's easier to tell whether a particular score is an outlier or not because this approach reports smaller differences on items with tighter distributions as bigger, and it reports bigger differences on items with wide distributions as smaller.
At the same time, however, the z-score approach builds on the integer assignment approach by incorporating and amplifying the drawbacks of the integer assignment approach. Because calculating integer-based averages is the first step of the approach, the same equidistant point assumption drawback described under the subsection about the average scores approach also applies to the z-score approach. This z-score approach goes on to make this drawback worse because, just like the average calculation, the standard deviation calculation assumes interval data rather than the merely ordinal data generated by surveys using a Likert-type scale. The standard deviation calculation also assumes a normal distribution. However, the responses to most survey items that use a Likert-type scale are skewed towards one of the ends of the set of response options. In rare cases, responses are polarized with a dip in the middle. These issues can make the results of the standard deviation calculation misleading.
As discussed above, there are several shortcomings to the above methods for summarizing survey responses. These shortcomings interfere with the ability to systematically identify noteworthy insights hiding in survey response data. Previously, seasoned survey experts have had to subjectively and often inaccurately adjust for these shortcomings when making recommendations to organizations on what to focus on or celebrate. What is needed is an improved system and method for receiving survey responses, more accurately summarizing these responses and segments of these responses, and more accurately ranking the noteworthy aspects of these responses.