Protection of computing infrastructure in large organizations is a growing challenge. Both generic malware and targeted attacks (such as, for example, advanced persistent threats (APTs)) continue to grow in sophistication and outpace the capabilities of traditional defenses such as antivirus software. Additionally, administrators' visibility into the posture and behavior of hosts is eroding, as employees increasingly use personal devices for business applications. Further, traditional perimeters of organizations are breaking down, moreover, due to new complexities in intelligence and file sharing, contractor relationships, and geographical distribution.
Many organizations and enterprises attempt to address these challenges by deploying security products that enforce policies and generate situational intelligence in the form of security logs. Such products yield logs that contain high volumes of information about activities in the network. For example, authentication logs might suggest that an employee's account has been compromised because a login has occurred from a region that the targeted employee has never visited. As another example, web proxy logs might record which site a user visited before being compromised by a drive-by download attack.
While existing detection approaches include examining logs to conduct forensic analysis of suspicious activity in a network, such approaches remain largely manual processes and often rely on signatures of known threats. Additionally, existing security products often come from a patchwork of vendors and are inconsistently installed and administered. Consequently, such products generate logs with formats that differ widely and that are often incomplete, mutually contradictory, and large in volume. Accordingly, a need exists for techniques to automatically extract and leverage knowledge from log data produced by a wide variety of security products in a large enterprise.