1. Technical Field
Aspects of the example implementations relate to a method of maintaining a large set of taint labels for dynamically tracking a flow of data through a program or system.
2. Related Art
In the related art, data flows may be traced through a program or a system. For example, when data is input into a program or a system, operations may be performed on that data. The data may then be output, either in its initial form or subject to operations of the program. The output data may be used to assess risk of unauthorized exposure.
Related art tools developed to track the flow of data through a program or a system associate a label or a value with a data point, such that the flow of data from one or more input points (also referred to as “sources”) through the one or more egress points (also referred to as “sinks”) can be traced. As a result, the flow of data through a program can be traced during design, testing phases, or in-production phases.
For example, a related art dynamic data flow instrumentation pass is provided that associates a shadow value/memory address with each value or memory address used by the program. The associated shadow value or memory address is referred to as a taint label. This technique has been implemented as a compile-time instrumentation pass.
Related art taint label sets are thus represented as bit sets with O(N) storage requirements, where N represents the number of labels. For example, 32-bit taint markings may require 32 separate tags, e.g., for each piece of data that needs to be tracked, one data bit is generated. In the related art, for the data flow tracking tool, shadow memory is allocated in memory. For example, each byte of application memory may correspond to two bytes of shadow memory, which are used to store its taint label. The related art has a minimum space requirement on the order of O(N).
Programs or systems may involve a large number (e.g., hundreds or thousands) of data items which need to be tracked separately. In such programs or systems, the related art O(N) representation quickly dominates the program's memory usage and/or execution time. Accordingly, it becomes impractical to apply the related art dataflow tracking scheme to the programs or systems having a large number of data items, to perform accurate data tracking.