Firewalls, servers, client systems, and other systems, nodes, and agents collect extensive log data reflecting the activities of a given computer, user, and/or other network entity, including the interaction of such entities with other internal and external resources, system, etc. For example, network interactions between a client system and internal and external network destinations, such as web page views, file or other object downloads, messages exchanged with other nodes via various communication protocols, etc. may be logged and reflected in “traffic” or “access” log data.
Tools and services exist to identify known or potential malicious web sites, computers, domains, etc. A wide variety of such services exist, and an enterprise or other user or group of users may use more than one service to ensure potential or actual security breaches are detected. Such services may identify known or potentially malicious entities by IP (or other) address and/or by domain or sub-domain name, URL, email addresses, file hashes, etc. At any given time, the set of such known or potentially malicious IP addresses, domains, etc. may number in the many millions.
Network owners and/or security administrators use tools to detect when users of computers on their network access known or potential malicious sites and/or computers. However, computers associated with a given network may generate logs in a wide variety of formats. To date, such tools have required connectors or other software specific to each different type of computer to parse log data and populate a corresponding structured database, which is then able to be used to search, for example, for data associated with known threats, such as known or potentially malicious domains and IP addresses. Typically, a regular expression or other code to extract information must be provided for each log line type. Tools that depend on log format specific connectors may not be able to keep up with changes to log formats, e.g., resulting from client or other source system updates, and/or new log formats associated with newly-deployed systems.