The most popular method for evaluating the impact and quality of an article is the citation count, which is the number of citations received by an article within a pre-specified time horizon. One limitation of citation count is its unavailability before this horizon expires (typically several years after publication). This delay renders citation counts primarily useful for historical assessment of the scientific contribution and impact of papers. Automatic prediction of citation counts could provide a powerful new method for evaluating articles. Faster identification of promising articles could accelerate research and dissemination of new knowledge.
Accurate models for citation count prediction could also improve our understanding of the factors that influence citations. Predicting and understanding article citation counts is however a challenging problem both on theoretical grounds and on the basis of several decades of related empirical work. In fact, the bulk of the literature concerning citation counts addresses the motivating factors for article citations rather than predicting them.
Difficulties in making accurate predictions are the sparseness of a citation network and that citation rates may have a degree of randomness. For example, a high impact journal paper may increase the citation rate of papers within the same issue. Previous empirical research predicted long-term citation counts from citations accumulated shortly after publication. For example, linear regression and citation count after 6 months have been used to predict citation counts after 30 months. In doing the analysis for the linear regression, author related information (i.e., the number of previous citations, publications, and co-authors for an author) was incorporated to improve predictions. Further, work has been done to use a regression model for predicting citation counts two years after publication using information available within three weeks of publication. The regression model used seventeen article-specific features and three journal specific features.
What is needed is a method and a computer system for predicting citation counts that is more reliable and predicts citation counts for long periods while only using information available at the time of publication of the article and that changes the article and publication technologies based upon the results computed by the system.