The Java Virtual Machine™ (JVM) uses an interpreter to simulate the execution of Java™ bytecode. Because interpreting is extremely slow, most modern JVMs include a just-in-time (JIT) compiler. The just-in-time compiler converts the Java bytecode into native machine instructions. To better optimize the generated native code, it may be advantageous to profile the Java bytecode using a technique known as interpreter profiling. This process involves “hooking” specific bytecodes and instrumenting the bytecodes to add events to a buffer. The contents of the buffer may then be processed. Processing may involve examining events in the buffer and summarizing them to provide useful information to the just-in-time compiler. The just-in-time compiler may use this information to make better optimization decisions. It may be beneficial to profile things such as call targets for virtual calls, branch taken/not-taken frequencies, array sizes, and call path information.
Currently, the most straight-forward way to obtain accurate call path information is to instrument each invoke bytecode, each return bytecode, each exception throw bytecode, and each catch bytecode in the interpreter. The same may be performed for compiled code, as well as calls through native interfaces. Special consideration may be required for events that transfer control between methods at places other than natural entry and exit points, such as on-stack replacement. On-stack replacement transfers control from compiled coded to an interpreter in the middle of method execution. This can lead to asymmetric event reporting unless special care is taken.
Instrumenting as described above may be used to fill buffers with enough information to obtain, if processed correctly, accurate call edges and call edge frequency (a call edge indicates a call-site/called-method pair, while call frequency indicates how often it was called). It is important that call edges are correct, as optimization decisions based on incorrect call edges can have tremendous negative performance impacts in the generated code. Ensuring accurate call edge frequency is less important than ensuring that call edge frequency is “order of magnitude” correct. For example, optimizers generally do not make decisions based on whether an edge was called 100 times versus 101 times.
Although the techniques described above are generally effective, they unfortunately introduce undesirable performance overhead. Specifically, instrumenting all of the events discussed above produces punitive overhead in terms of interpreter performance. Instrumenting events in compiled code negatively impacts the performance of the compiled code. Furthermore, instrumenting all of the events discussed above produces large amounts of buffer data that requires processing. The performance overhead is significant enough that obtaining the information in the manner described above is not widely performed, since the drawbacks typically outweigh the benefits.
In view of the foregoing, what are needed are apparatus and methods to more efficiently profile call path information. Ideally, such apparatus and methods will provide useful information to a just-in-time compiler while not unduly hindering performance.