The disclosure relates to the field of confirming sensitive objects in managed software heap systems, and more particularly to derived sensitivity based on references by known sensitive objects in the heap.
Memory management in runtime environments is often devised so as to provide convenience for the software engineer. For this reason, runtime environments such as Java, C# and most scripting language runtimes include heap memory that can be said to be managed, such as by the inclusion of a garbage collector (Java is a trademark of Sun Microsystems, Inc.). A garbage collector is a runtime facility for automatically identifying and discarding unused data from memory, such as objects, so as to free up storage. Garbage collection is a luxury afforded by the efficiencies of modern computer systems that serves to liberate software engineers from the task of programmatically discarding each and every unused object.
A managed heap provided by a runtime environment is used by application programs for the storage and retrieval of data objects, such as instances of classes in object oriented environments or the storage of other data structures. The heap is therefore accessible to applications. Furthermore, problem determination and diagnosis for software applications and runtime environments will typically involve accessing the contents of a heap in order to understand the state of an application and the runtime environment at a particular point in time, or over a period of time. For example, diagnosis activities for the resolution of memory leaks, software operational problems and data organization issues can involve access to the heap. The heap can be accessed at runtime, during execution of an application, or via a record of the contents of the heap in a dump file.
Some data stored in the heap can be sensitive. For example, certain applications may involve secret or confidential information that should not be shared outside an organization. Such sensitive data in the heap is secure as long as the heap is present only on a secure, trusted machine. However, problem determination and diagnosis often requires access to the heap by machines and personnel not party to the sensitive data. To respect the sensitivity of such data access must be restricted. This can hinder problem determination and diagnosis activities. A particular example is where application software or the runtime environment is serviced by an organization not capable of being or willing to be entrusted with the sensitive information. In such a scenario, a dump file of data objects in a managed heap containing sensitive data cannot be shared with a servicing organization for the purpose of problem determination. It is therefore necessary to identify sensitive data in the heap in order that access to it can be controlled.
One approach to address this issue is to manually identify and remove all sensitive data from the heap before sharing it with a servicing entity. Such an approach is very time consuming, especially for large and complex heap dumps, and prone to error or omission. Also, as part of problem determination many heap dumps are often generated and shared to exercise software or the runtime environment to reproduce and understand a problem. The work involved in identifying and removing sensitive data in such a scenario is prohibitively expensive.
Another approach is to employ a tool to automatically filter data likely to be sensitive from a heap dump based on rules or patterns characterizing known sensitive data. For example a heap dump could be searched for all numbers that may constitute credit card numbers, or to replace all strings with random characters in order to obfuscate or remove possible sensitive references. This approach is not reliable or effective. It is not possible to characterize sensitive data in a way that all such sensitive data is easily identified and all non-sensitive data is retained. Either sensitive data slips through the net and is retained in the dump or non-sensitive data is removed or obfuscated in an over-cautious manner in a way that hinders problem determination.
A further approach is to have sensitive classes of data or data structures identified such that objects that are instances of such classes or data structures can be explicitly removed from or obfuscated in a heap dump. While this approach is effective for those explicitly identified classes and data structures, it cannot fully address the problem due to the nature of data objects in the heap that are encapsulated within or referenced by other data objects. For example, a data object corresponding to a customer and being an instance of a customer class may be identified as a sensitive data object because the customer class is identified as a sensitive class. Such a class can include encapsulated or referenced further objects such as string objects with name and address information, numeric objects including credit card details, references to aggregation data structures including lists of customer orders, communications, etc. These encapsulated or referenced further objects are not indicated as sensitive by virtue of the sensitivity of the customer object, not least because they are instances of classes that can include non-sensitive data. Further, such objects can include extensible data structures or collection objects into which any number of all manners of data objects could be stored. Thus, using this approach to identifying sensitive objects requires a comprehensive definition of all classes of object that could contain sensitive data. In practice, this will include many classes that often never include sensitive data or that sometimes include sensitive data and sometimes do not, such as strings, numerics and collections, so resulting in an overcautious approach with many false positive determinations of sensitivity.
Thus despite these various approaches there remains a need to confirm the sensitivity of a data object in a managed object heap.