6 Experimental Guidelines

Chapter 6
Experimental Guidelines

This section provides some tips on collecting performance numbers with Jikes RVM.

6.1 Which boot image should I use?

To make a long story short the best performing conﬁguration of Jikes RVM will almost always be production. Unless you really know what you are doing, don’t use any other conﬁguration to do a performance evaluation of Jikes RVM.

Any boot image you use for performance evaluation must have the following characteristics for the results to be meaningful:

config.assertions=none. Unless this is set, the runtime system and optimizing compiler will perform fairly extensive assertion checking. This introduces signiﬁcant runtime overhead. By convention, a conﬁguration with the Fast preﬁx disables assertion checking.
config.bootimage.compiler=opt. Unless this is set, the boot image will be compiled with the baseline compiler and virtual machine performance will be abysmal. Jikes RVM has been designed under the assumption that aggressive inlining and optimization will be applied to the VM source code.

6.2 Compiler Replay

The compiler-replay methodology is deterministic and eliminates memory allocation and mutator variations due to non-deterministic application of the adaptive compiler. We need this latter methodology because the non-determinism of the adaptive compilation system makes it a diﬃcult platform for detailed performance studies. For example, we cannot determine if a variation is due to the system change being studied or just a diﬀerent application of the adaptive compiler. The information we record and use are hot methods and blocks information. We also record dynamic call graph with calling frequency on each edge for inlining decisions.

Note that in December 2011, compiler replay was signiﬁcantly improved. The notes below apply to the post December 2011 version of replay.

Here is how to use it:

6.2.1 Generate Advice

There are three kinds of advice used by the replay system, each is workload-speciﬁc (ie you should generate advice ﬁles for each benchmark):

Compilation advice (.ca ﬁle). This advice records for every compiled method which compiler (base or opt) and if opt, at which optimization level it should be compiled. Replay compilation will not work without a compilation advice ﬁle.
Edge counts (.ec ﬁle). This advice captures edge counts generated by the execution of baseline-compiled code. Edge counts are used by the compiler to understand which edges in the control ﬂow graph are hot. At the time of writing, edge counts were measured as contributing about 2% to the bottom line in terms of performance (average of DaCapo, jvm98 and jbb)
Dynamic callgraph (.dc ﬁle). This advice captures the dynamic call graph, which allows the compiler to understand the frequency with which particular call chains occur. This is particularly useful in guiding inlining decisions. At the time of writing the call graph contributes about 8% to the bottom line in terms of performance.

One way to gather advice is to execute the benchmark multiple times under controlled settings, producing proﬁles at each execution. Then establish the fastest execution among the set of runs, and choose the proﬁles associated with that execution as the advice ﬁles. A common methodology is to invoke each benchmark 20 times (ie take the best invocation from a set of 20 trials), and in each invocation, run 10 iterations of the benchmark (ie the advice will then capture the warmed-up, steady state of the benchmark). For more advanced methodologies, please refer to current research papers on this topic.

When generating the advice, you will need to use the following command line arguments (typically use all six arguments, so that all three advice ﬁles are generated at each invocation):

6cAp1x8-670006.2.1: For adaptive compilation proﬁle

-X:aos:enable_advice_generation=true -X:aos:cafo=my_compiler_advice_file.ca

6cAp2x8-670006.2.1: For edge count proﬁle

-X:base:profile_edge_counters=true -X:base:profile_edge_counter_file=my_edge_counter_file.ec

6cAp3x8-670006.2.1: For dynamic call graph proﬁle

-X:aos:dcfo=my_dynamic_call_graph_file.dc -X:aos:final_report_level=2

6.2.2 Executing with advice

The basic model is simple. At a nominated time in the execution of a program, all methods speciﬁed in the .ca advice ﬁle will be (re)compiled with the compiler and optimization level nominated in the advice ﬁle. Broadly, there are two ways of initiating bulk compilation: a) by calling the method org.jikesrvm.adaptive.recompilation.BulkCompile.compileAllMethods() during execution, and b) by using the -X:aos:enable_precompile=true ﬂag at the command line to trigger bulk compilation at boot time. A standard methodology is to use a benchmark harness call back mechanism to call compileAllMethods() at the end of the ﬁrst iteration of the benchmark. At the time of writing this gave performance roughly 2% faster than the 10th iteration of regular adaptive compilation. Because precompilation occurs early, the compiler has less information about the classes, and in consequence the performance of precompilation is about 9% slower than the 10th iteration of adaptive compilation.

For ’warmup’ replay (where org.jikesrvm.adaptive.recompilation.BulkCompile.compileAllMethods() is called at the end of the ﬁrst iteration):

-X:aos:initial_compiler=base -X:aos:enable_bulk_compile=true -X:aos:enable_recompilation=false -X:aos:cafi=benchmark.ca -X:vm:edgeCounterFile=benchmark.ec -X:aos:dcfi=benchmark.dc

For precompile replay (where bulk compilation occurs at boot time):

-X:aos:initial_compiler=base -X:aos:enable_precompile=true -X:aos:enable_recompilation=false -X:aos:cafi=benchmark.ca -X:vm:edgeCounterFile=benchmark.ec -X:aos:dcfi=benchmark.dc

6.2.3 Verbosity

You can alter the verbosity of the replay behavior with the ﬂag -X:aos:bulk_compilation_verbosity, which by default (0) is silent, but will produce more information about the recompilation with values of 1 or 2.

6.3 Measuring GC performance

MMTk includes a statistics subsystem and a harness mechanism for measuring its performance. If you are using the DaCapo benchmarks, the MMTk harness can be invoked using the ’-c MMTkCallback’ command line option, but for other benchmarks you will need to invoke the harness by calling the static methods

org.mmtk.plan.Plan.harnessBegin()
org.mmtk.plan.Plan.harnessEnd()

at the appropriate places. Other command line switches that aﬀect the collection of statistics are

Option	Description
-X:gc:printPhaseStats=true	Print statistics for each mutator/gc phase during the run
-X:gc:xmlStats=true	Print statistics in an XML format (as opposed to human-readable format)
-X:gc:verbose	This is incompatible with MMTk’s statistics system.
-X:gc:variableSizeHeap=false	Disable dynamic resizing of the heap

Unless you are speciﬁcally researching ﬂexible heap sizes, it is best to run benchmarks in a ﬁxed size heap, using a range of heap sizes to produce a curve that reﬂects the space-time tradeoﬀ. Using replay compilation and measuring the second iteration of a benchmark is a good way to produce results with low noise.

There is an active debate among memory management and VM researchers about how best to measure performance, and this section is not meant to dictate or advocate any particular position, simply to describe one particular methodology.

6.4 Jikes RVM is really slow! What am I doing wrong?

Perhaps you are not seeing stellar Jikes^TM RVM performance. If Jikes RVM as described above is not competitive product JVMs, we recommend you test your installation with the DaCapo benchmarks. We expect Jikes RVM performance to be very close to Sun’s HotSpot 1.5 server running the DaCapo benchmarks. Of course, running DaCapo well does not guarantee that Jikes RVM runs all codes well.

Some kinds of code will not run fast on Jikes RVM. Known issues include:

Jikes RVM start-up may be slow compared to the some product JVMs.
Remember that the non-adaptive conﬁgurations (-X:aos:enable_recompilation=false -X:aos:initial_compiler=opt) opt-compile every method the ﬁrst time it executes. With aggressive optimization levels, opt-compiling will severely slow down the ﬁrst execution of each method. For many benchmarks, it is possible to test the quality of generated code by either running for several iterations and ignoring the ﬁrst, or by building a warm-up period into the code. The SPEC benchmarks already use these strategies. The adaptive conﬁguration does not have this problem; however, we cannot stipulate that the adaptive system will compete with the product on short-running codes of a few seconds.
Performance on tight loops may suﬀer. The Jikes RVM mechanism for safe points (thread preemption for garbage collection, on-stack-replacement, proﬁling, etc) relies on the insertion of a yield test on every back edge. This will hurt tight loops, including many simple microbenchmarks. We should someday alleviate this problem by strip-mining and hoisting the yield point out of hot loops, or implementing a safe point mechanism that does not require an explicit check.
The load balancing in the system is naive and unfair. This can hurt some styles of codes, including bulk-synchronous parallel programs.

The Jikes RVM developers wish to ensure that Jikes RVM delivers competitive performance. If you can isolate reproducible performance problems, please let us know.

6.5 Stability of Jikes RVM

Jikes RVM is not as stable as commercial JVMs such as HotSpot or J9. Design your evaluation systems (e.g. scripts) so that they can deal with crashes and deadlocks/livelocks. The latter can be dealt with by running Jikes RVM with a timelimit. For example, if you are using Linux and shell scripts, you can use the timelimit program to terminate the Jikes RVM after a set time.

Jikes RVM

Resources

Documentation

Project Information

Chapter 6
Experimental Guidelines

6.1 Which boot image should I use?

6.2 Compiler Replay

6.2.1 Generate Advice

6.2.2 Executing with advice

6.2.3 Verbosity

6.3 Measuring GC performance

6.4 Jikes RVM is really slow! What am I doing wrong?

6.5 Stability of Jikes RVM

Jikes RVM

Resources

Documentation

Project Information

Chapter 6Experimental Guidelines

6.1 Which boot image should I use?

6.2 Compiler Replay

6.2.1 Generate Advice

6.2.2 Executing with advice

6.2.3 Verbosity

6.3 Measuring GC performance

6.4 Jikes RVM is really slow! What am I doing wrong?

6.5 Stability of Jikes RVM

Chapter 6
Experimental Guidelines