With web servers receiving millions of requests for content on a daily basis, a need has arisen to be able to discern whether the request actually is from a human or a non-human, such as an automated program (e.g., web crawlers, pingers, spiders, robots, etc.). Previous approaches for non-human visitor detection involve the storage of data-centric characteristics of a visitor (e.g., IP addresses, user agents, and other data components). These approaches are limiting because of the unreliable nature of the data characteristics relied upon with these approaches. Examples of limiting factors of these approaches include:
1. IP addresses often change.
2. User agent strings change.
3. Not all non-human visitors properly identify themselves. Often this is intentional on the part of the computer program visiting the site, in order to bypass filtering systems.
4. Pingers only occasionally access the web site (to ensure that it is still up), so their traffic tends to be small but persistent in nature. This makes it difficult to immediately spot them for filtering.
The result is inaccurate and skewed web analysis, with either inflated numbers because of the erroneous inclusion of non-human visitors, or worse, the accidental deletion of real visitor data from the analysis.