As malware protection becomes more effective, there is an increasing use of phishing and copycat websites that use social engineering to convince users of their legitimacy, capturing credentials from users and creating fraudulent transactions. Current solutions rely on fuzzy domain name matching, certificate and security checks, domain/IP reputation and user reported lists to identify potentially harmful websites. Some mechanisms for identifying potentially harmful websites may utilize a content classification engine that may process a webpage and categorize the webpage based on the content displayed on the webpage.