Data leak protection techniques may be used to protect confidential information in an attempt to preventing such information from leaving the boundaries of an organization. However, a major shortcoming of conventional data leak protection techniques is a reliance on human-defined policies. For example, a policy may be set to prohibit emails containing a human defined keyword from being transmitted by an email server. Such policies may trap emails that do not contain confidential information, and may not trap other emails that do contain confidential information. Even if the keywords are updated over time by human operators, approaches that use human defined keywords are subject to a high number of false positives. Increasingly, people communicate via multiple types of communications programs. Thus, keyword based policies set on an email server have another drawback in that they cannot prevent transmission of confidential information on a different platform, such as a SMS messaging platform, chat platform, etc. The false positive and leaks are exacerbated by the sheer volume of electronic communications in modern organizations, as the number of electronic messages sent each day globally is in the billions.