Malicious actors are present in the Global Internet, ranging from hackers themselves to infected zombie workers. Finding and blacklisting these malicious actors (and oftentimes other measures like taking sites down) is crucial to keep both companies and individual users safer. By the end of 2016, the Global Internet was estimated to have over 3.5 billion users, 1.1 billion hosts, over 1 billion websites and its traffic reached 1.1 zettabytes per year. The Global Internet's number of hosts, webpages, amount of traffic and possible packet transit routes are constantly growing. At the same time as this constant growth, the number of security experts, which are being able to analyze that data, is very limited. Unfortunately, currently, a great amount of internet security related tasks still rely on human cognition and expert judgment, making it unscalable and not able to keep up with the constant growth of the Global Internet.
There are many services that, in addition to providing blacklists, compute reputation scores (from blacklists, user reports, contextual relations between URLs, Passive DNS data and IP addresses that malware connects to, honeypots, crawlers). Alternatively, some systems have other ways of deciding which IP addresses to focus on first but most do not reveal their methods of choosing those if they are not trivial (e.g. observe IPs that attacked the honeypots).
The known methods and system to identify malicious actors are not scalable enough to analyze a whole netflow and therefore choosing some focus areas strictly limits the capabilities of such approaches. Most known methods prioritize finding data that have a reasonable level of confidence in to avoid False Positives (even though false positives still appear from time to time). Furthermore, innovations in internet crime (such as new types of malicious activity, new attack tools, new hardware types used to form botnets, etc.) makes confirming that addresses are malicious a very slow process and error prone process. Furthermore, in the past, due to lack of processing power, it was not possible to gather and successfully analyze netflows with machine learning techniques.