A substantial portion of personal data theft, such as names, addresses, credit card information, social security numbers, bank account information, and the like, now occurs on the Internet. For example, stolen credit card numbers are frequently bought and sold on forums, chat rooms, and other collaborative tools on the dark web. Detecting the theft of personal data and tracking its exchange across the Internet presents a significant technological challenge. For example, it is resource intensive to crawl and scrape the vast amounts of data on the Internet, and it is also resource intensive to cull through the vast amounts of crawled and scraped data.
Another drawback of existing techniques is managing the privacy of the personal data for which the theft is being detected. For example, many users must either transmit their personal data in plain-text format, without any encryption or hashing, to the detection company or software and/or trust that the detection company or software will cryptographically hash such data upon receipt and destroy the unencrypted plain-text copy. On the other hand, if users were to transmit cryptographically hashed personal data directly to the detection company of software, the resource requirement for culling often begins to scale exponentially due to the necessity of cryptographically hashing the scraped data in order to perform the comparison.
Accordingly, there is a need for improved techniques for detecting the theft of personal data across the Internet.