A Sampling Profiler for a JIT Compiler

Andreas Wälchli. A Sampling Profiler for a JIT Compiler. Masters thesis, University of Bern, September 2020. Details.

Abstract

For efficient execution of dynamically typed languages, many implementations use a two-tier architecture. The first tier is used for low-latency startup and collects dynamic profiles, e.g., the types of all program variables. The second tier provides high throughput through the use of an optimizing compiler. This compiler specializes the code for the type information recorded in the first tier. If a program suddenly changes its behavior and presents the compiled code with types that have not been seen before and that are incompatible with the compiled version, that specialization becomes invalid. It is deoptimized and control is transferred back to the first tier where new profiles are gathered and specialization can start anew. But if the program behavior becomes more specific, for instance, if a variable suddenly becomes monomorphic (i.e., only takes on one single type) this will not trigger a deoptimization as it is still compatible with the compiled version. If the program were recompiled with that monomorphic variable in mind, performance could be improved. Once the program is running in an optimized form there are no means to notice such optimization opportunities. We propose the use of a sampling profiler to monitor native code without instrumentation. With the absence of instrumentation we incur no overhead when the profiler is inactive and can control the active profiler overhead by limiting the sampling rate. It also allows sampling at random points in the program and not just at predefined locations. Our implementation is R-hacek in the context of the optimizing R JIT-compiler for the R language. Based on the collected R-hacek profiles we are able to detect when the native code produced by R is specialized for stale type information and trigger recompilation for more specific type information. We show that sampling with our profiler adds an overhead of less than 3% in most cases and up to 9% in some cases when active. We also show that it reliably detects stale type information within milliseconds.

Posted by scg at 4 September 2020, 8:15 pm link
Last changed by admin on 21 April 2009