Technical Field
The disclosure relates to malware detection systems and, more specifically, to a modularized database architecture using vertical partitioning for a state machine of a malware detection system.
Background Information
A prior approach to analyzing potential malicious software (malware) involves use of a malware detection system configured to examine content of an object, such as a web page, email, file or universal resource locator, and rendering of a malware/non-malware classification based on previous analysis of that object. The malware detection system may include an analysis engine having one or more stages of analysis, e.g., static analysis and/or behavioral analysis, of the object. The static analysis stage may be configured to detect anomalous characteristics of the object to identify whether the object is “suspect” and deserving of further analysis or whether the first object is non-suspect (i.e., benign) and not requiring further analysis. The behavioral analysis stage may be configured to process (i.e., analyze) the suspect object to arrive at the malware/non-malware classification based on observed anomalous behaviors.
The observed behaviors (i.e., analysis results) for the suspect object may be recorded in an object cache that may be accessible via an object identifier (ID) that is generated for the object. The object cache may be organized as a single data structure (e.g., a large table) having a plurality of entries or rows, each of which represents metadata of an object, and a plurality of columns, each of which represents an attribute of the object metadata. The rows of the cache may be configured to store updates, such as insertions and deletions, of the object metadata, which may include constant metadata (such as an object ID and size of object) as well as behavioral metadata (such as states associated with the object).
Use of the single table to accommodate such updates may adversely impact performance of the object cache, particularly when a large number of rows (i.e., object metadata) are regularly modified (i.e., updated) triggering frequent garbage collection. That is, a number (e.g., M) of rows transitioning through another number (e.g., N) of updates (i.e., states) yields a much larger number (e.g., M×N) of dirty rows requiring garbage collection. As a result, the overall performance of the object cache degrades. In addition, use of the single table may suffer from a loss of object metadata (i.e., information in the rows) as updates occur overwriting existing metadata (i.e., the dirty rows are reclaimed).
Further, performance is also impacted where two or more processes attempt to access, e.g., read, write and/or overwrite, the object metadata of the rows concurrently. To improve performance, the rows of the table may be copied (i.e., shadow copied) to additional (unused) rows of the table to accommodate the concurrent accesses. As a result, subsequent read accesses of the object metadata may be directed to the shadow copies pending synchronization with the original row (and garbage collection of the shadow copy). In addition, a number of states associated with the object may increase as the object metadata is analyzed (e.g., behavioral analysis), thereby requiring the insertion of yet more rows into the object cache to capture information associated with each state. However, multiple updates to the object metadata (i.e., row insertion, column updates, and garbage collection) and concomitant contention may adversely impact performance of the system. Moreover, as the object metadata of each row transitions through various states during the analysis, there may be overwrite of one or more attributes of the object metadata. Therefore, in addition to the adverse performance impact (from inserting, copying and garbage collection), the use of the single table may suffer from a loss of information (i.e., object metadata) as the states transition.