Researchers often seek to identify important and interesting new discoveries in their fields, and then use these new discoveries as quickly as possible in their work. The task of making new discoveries is increasingly difficult as the volume of research literature increases exponentially. Traditional measures of research impact, such as citation counts, operate at low velocities, taking years to accrue, and may not provide enough information to researchers about newly published works. Instead, researchers are turning to usage and other alternative metrics (“altmetrics”) to provide higher velocity indicators of interest to guide them, particularly in the pre-citation period, where a newly published work has not had time to accrue traditional measures of research impact. Providing these measures from aggregated or metadata databases is particularly challenging because usage is increasingly driven by automated machines—robotic traffic—that does not accurately reflect the interest or importance of individual research artifacts.
Within the academic and scientific literature space, one conventional approach to distinguishing between human and robotic traffic is to utilize pre-established identifications of non-human users, likely by way of IP addresses of servers deploying web-crawlers and other computer-implemented scripts. Research sessions linked to such IP addresses are flagged as non-human sessions and are removed from further analysis. Requiring pre-identification of non-human users is not a plausible or sustainable solution to robotic traffic due to the complexity of human and non-human interactions in these research sessions, where humans may, for example, utilize automated scripts in an ad hoc manner to complete repetitive tasks.