1. Technical Field
The present invention relates generally to a method of gathering and correlating user input regarding relationships between topics, this information being useful in designing web sites, program interfaces, and many other information design applications. Specifically, the present invention provides a method and algorithm for handling a lack of user input in portions of the data gathering process where the user is unfamiliar with some of the items presented.
2. Description of Related Art
Card sorting is a technique used by the builders of web sites to organize the information on the site and to decide how to label the categories for ease of use. The technique works by gathering data from a number of users regarding their perception of relationships between topics. The strength of the perceived relationships can then drive the design of the site.
In a manual version of card sorting, a user is given a set of index cards containing likely topics for the site, one topic per card. The user then sorts the cards into groups according to his perception of which topics belong together. Note that in this exercise, there is no right or wrong way to sort items. This is a subjective exercise that seeks to discover perceptions. Therefore, different users will have a tendency to group items differently, especially as the ideas they represent become the more complex.
The input from a number of users can then be correlated in a matrix according to how closely users group each set of two cards together, a methodology known as cluster analysis. Manual correlation and analysis, however, can be tedious.
EZSort is a software package created by IBM, Inc., which handles the card sorting process and analysis. EZSort has two parts—USort and EZCalc. USort handles the card sorting exercise for all participants; EZCalc performs cluster analyses on the accumulated data and generates tree diagrams that represent the hierarchical relationships.
FIG. 1A shows a computer screen containing a typical card-sorting exercise handled by USort. In this figure, the “cards” to be sorted are presented on the left side of the screen (the source); the right side of the screen (the target) is where a user sorts the cards into groups separated by horizontal lines, using drag and drop operations. Notice that the cards to be sorted include a wide variety of topics including hardware, software, languages, operating systems, interfaces between users, interfaces between computers, etc. A user's background and experience will tend to affect the way that he would perceive items as belonging together.
When the user is satisfied with the groupings, clicking on the right arrow (110) causes the program to move to the next step, seen in FIG. 1B. On this second screen, the user is allowed to designate further, higher-level groupings, if these are deemed desirable. The previously formed groups are presented. The groups can be rearranged, and larger groupings formed by making the lines between high-level groups into double lines. In a third step, which is not shown, the user is allowed to name the categories into which he has grouped items. Once the exercise is complete, the users information is saved to a file for later processing.
When all card-sorting exercises have been done, the data goes to EZCalc for analysis. A raw score matrix is created for each participant, according to the following. If two items are not grouped together by the participant, a value of 0 is assigned. If the two items are grouped together in a high-level grouping, but not in the low-level grouping, a value of 1 is assigned. If the two items are grouped together in both the high-level and low-level groupings, a value of 2 is assigned. Thus, each possible pairing of items receives a score of 0, 1, or 2. Next, the raw scores for each pair of items are summed together for all of the participants, forming a total raw score matrix. The values in this matrix are normalized into a similarity matrix by dividing each score by 2·n, where n is the number of participants. Each element in the similarity matrix now has a score of 0 to 1. Items in the similarity matrix are converted into a distance matrix, using the formulaD(x,y)=1−S(x,y)where D(x,y) is an element in the distance matrix for card pair x and y, and                S(x,y) is a corresponding element in the similarity matrix.        
Finally, cluster analysis converts the distance matrix into tree diagrams for analysis.
While this type of program has been very helpful in speeding up the analysis of card-sorting applications, a problem exists when participants are not familiar with the content of every card. This can happen, for example, when a company provides a variety of specialized, technical products, such as those shown in FIGS. 1A and 1B. A person who regularly utilizes some of the products may have little or no knowledge in other products. This type of program has previously required each participant to group every card that was presented to them, regardless of their knowledge of the content of the card. By forcing the sorting of “unknown” cards, the relationships involving them are skewed. It would be desirable to have a program that did not force such a choice, but that could deal with this lack of input in some areas.