(1) Field of Invention
The present invention relates to a system for automated classification of user behavior and interactions in collaborative media and, more particularly, to a system for automated classification of user behavior and interactions in collaborative media using temporal motifs.
(2) Description of Related Art
The open access policy of collaborative media, an example of which includes the online encyclopedia Wikipedia, has made it possible for users to modify the collaborative media through creating, editing, and deleting existing content, or adding and removing pages altogether. While early speculation suggested that this open access model was unsustainable, Wikipedia has continued to grow and improve in quality. Underlying this growth are the collaborative, and sometimes combative, interactions between editors working on the same content. Understanding the effects of different kinds of interactions enables one to predict future growth patterns in Wikipedia, as well as to assess the different characteristics of editors, page collaboration levels, and the interactions themselves.
Much of the current analysis of Wikipedia's authors interaction has focused on high-level characteristics such as the number of inter-editor reverts (see the List of Cited Literature References, Literature Reference No. 3) or interactions on talk pages (see Literature Reference No. 6). Recent studies have begun assessing the impact of specific editing behavior by looking at how editors revise each other's work (see Literature Reference Nos. 2 and 5). However, the focus has been on identifying contentious behavior between editors, and, accordingly, almost all models are limited in their ability to represent and analyze productive, collaborative behaviors.
For instance, Wu, Harrigan, and Cunningham (see Literature Reference No. 9) analyzed a static graph consisting of author-page and page-page edges for a small subset of Wikipedia's pages in order to discover motifs that correspond to author editing behavior. Their analysis was limited to discovering patterns for authors interacting with different pages.
Laniado et al. (see Literature Reference No. 6) analyzed a graph of editor interactions on author talk pages for assessing collaboration. They built the graph from discussion chains where authors reply to each other's comments on the talk page, and then analyzed the structure of the graph to discover patterns in editor interactions. A major disadvantage of this approach is that the majority of Wikipedia articles do not have a significant talk page to analyze, thereby preventing a full analysis of the collaboration. Furthermore, many authors do not participate on the talk pages, leaving out much on-page interaction.
Laniado and Tasso (see Literature Reference No. 5) built a co-authorship network from authors' interactions on the same page in order to identify high quality editors on the basis of network properties. They applied a series of network measures (centrality, clustering coefficients, and assortativity) to analyze how editors work together. Furthermore, they removed bots, administrators, and highly productive users for some experiments and used simple statistics that capture macroscopic properties of collaboration rather than specific editing behavior.
Additionally, Roth, Tamborelli, and Gilbert (see Literature Reference No. 7) assessed the impact on Wikipedia's growth with respect to anonymous, regular, and administrative user types by tracking the density of users relative to the page's quality. They performed an edit-frequency analysis to determine the impact of editor density relative to editor type (e.g., anonymous, editor, and administrator), showing different quality implications for different distributions. However, this work does not assess the interactions of the editors on the page, only the frequency and density. Therefore, the approach is limited in its ability to capture the collaborative and combative interactions.
Furthermore, Brandes and Lerner (see Literature Reference No. 1) built a co-author network for all authors editing the same page, using reverts and inter-editor revisions to identify collaborative and contentious behavior. Brandes et al. (see Literature Reference No. 2) extended this work by identifying structural network properties that are correlated with the quality of Wikipedia articles. The edges in the network were labeled with how the authors have interacted. Then, a partitioning algorithm was used to break a page's editor-interaction network into groups based on mutually conflicting edges (i.e., two editors deleting each other's text). This work builds a static network that only considers the sum of each editor's relation with another, rather than how the editors have interacted through time. Thus, the static network is unable to capture changes in the editors' relationships as well as the dynamics of editing behaviors (e.g., do certain interactions give rise to more of the same type?).
Sumi et al. (see Literature Reference No. 8) performed a temporal analysis of burstiness in an article's revision and talk page histories to assess whether the article is undergoing a period of high editor conflict or has been vandalized. They considered only one type of edit, the revert, in determining the controversial status of the page. Furthermore, they also analyzed the controversial status relative to the page length and its talk page's length, showing longer talk pages are correlated with controversial status. However, their approach does not take into account how editors interact other than by reverting each others changes.
Kittur et al. (see Literature Reference No. 3) analyzed conflict and coordinates in pages by training a support vector machine (SVM) classifier on page-related features. Their follow-up feature analysis showed that certain types of editors and edits were strongly correlated with controversial pages. However, their model does not incorporate the interactions between editors. In a second experiment, they built a graph of users that had reverted each other's changes, which they partitioned to discover mutually conflicting groups. Similar to Brandes and Lerner (see Literature Reference No. 1), this approach cannot discover changes in behavior over time or the dynamics of the interactions.
Kovanen et al. (see Literature Reference No. 4) demonstrated how to construct and detect temporal motifs using a set of randomized null models. However, they do not apply the results of motif detection to any purpose.
Each of the prior methods above exhibit limitations that make them incomplete. Thus, a continuing need exists for a system and method that enables both productive and destructive editing behavior analysis in collaborative media, such as Wikipedia.