Quick Links:

Releases | Mailing Lists | Source Control | Issue Tracker | Regression Tests

13 Compilers

Chapter 13
Compilers

13.1 Baseline Compiler

13.1.1 General Architecture

The goal of the baseline compiler is to efficiently generate code that is ”obviously correct.” It also needs to be easy to port to a new platform and self contained (the entire baseline compiler must be included in all Jikes RVM boot images to support dynamically loading other compilers).

Roughly two thirds of the baseline compiler is machine-independent. The main file is BaselineCompiler and its parent TemplateCompilerFramework. The main platform-dependent file is BaselineCompilerImpl.

Baseline compilation consists of two main steps: GC map computation (discussed below) and code generation. The code generation in the baseline compilers is mostly straightforward, consisting of a single pass through the bytecodes of the method being compiled.

Differences in the hardware architectures lead to slightly different implementation strategies for the baseline compilers. For example, the IA32 baseline compiler does not try to optimize register usage, instead the bytecode operand stack is held in memory. This leads to bytecodes that push a constant onto the stack, creating a memory write in the generated machine code. The number of memory accesses in the IA32 baseline compiler corresponds directly to the number of bytecodes. In contrast to this, the PPC baseline compiler does some register allocation of local variables (and should probably do even more register allocation to properly exploit the register set).

TemplateCompilerFramework contains the main code generation switch statement that invokes the appropriate emit<bytecode>_ method of BaselineCompilerImpl.

13.1.2 GC Maps

The baseline compiler computes GC maps by abstractly interpreting the bytecodes to determine which expression stack slots and local variables contain references at the start of each bytecode. There are additional compilations to handle JSRs; see the source code for details. This strategy of computing a single GC map that applies to all the internal GC points for each bytecode slightly constrains code generation. The code generator must ensure that the GC map remains valid at all GC points (including implicit GC points introduced by null pointer exceptions). It also forces the baseline compiler to report reference parameters for the various invoke bytecodes as live in the GC map for the call (because the GC map also needs to cover the various internal GC points that happen before the call is actually performed). Note that this is not an issue for the optimizing compiler which computes GC maps for each machine code instruction that is a GC point.

13.2 Optimizing Compiler

The documentation for the optimizing compiler is organized into the following sections.

13.2.1 Method Compilation

The fundamental unit for optimization in Jikes RVM is a single method. The optimization of a method consists of a series of compiler phases performed on the method. These phases transform the IR (intermediate representation) from bytecodes through HIR (high-level intermediate representation), LIR (low-level intermediate representation), and MIR (machine intermediate representation) and finally into machine code. Various optimizing transformations are performed at each level of IR.

An object of the class CompilationPlan contains all the information necessary to generate machine code for a method. An instance of this class includes, among other fields, the RVMMethod to be compiled and the array of OptimizationPlanElements which define the compilation steps. The execute method of an CompilationPlan invokes the optimizing compiler to generate machine code for the method, executing the compiler phases as listed in the plan’s OptimizationPlanElements.

The OptimizationPlanner class defines the standard phases used in a compilation. This class contains a static field, called masterPlan, which contains all possible OptimizationPlanElements. The structure of the master plan is a tree. Any element may either be an atomic element (a leaf of the tree), or an aggregate element (an internal node of the tree). The master plan has the following general structure:

A client (compiler driver) constructs a specific optimization plan by including all the OptimizationPlanElements contained in the master plan which are appropriate for this compilation instance. Whether or not an element should be part of a compilation plan is determined by its shouldPerform method. For each atomic element, the values in the OptOptions object are generally used to determine whether the element should be included in the compilation plan. Each aggregate element must be included when any of its component elements must be included.

Each element must have a perform method defined which takes the IR as a parameter. It is expected, but not required, that the perform method will modify the IR. The perform method of an aggregate element will invoke the perform methods of its elements.

Each atomic element is an object of the final class OptimizationPlanAtomicElement. The main work of this class is performed by its phase, an object of type CompilerPhase. The CompilerPhase class is not final; each phase overrides this class, in particular it overrides the perform method, which is invoked by its enclosing element’s perform method. All the state associated with the element is contained in the CompilerPhase; no state is in the element.

Every optimization plan consists of a selection of elements from the master plan; thus two optimization plans associated with different methods will share the same component element objects. Clearly, it is undesirable to share state associated with a particular compilation phase between two different method compilations. In order to prevent this, the perform method of an atomic element creates a new instance of its phase immediately before calling the phase’s perform method. In the case where the phase contains no state the newExecution method of CompilerPhase can be overridden to return the phase itself rather than a clone of the phase.

13.2.2 IR

The optimizing compiler intermediate representation (IR) is held in an object of type IR and includes a list of instructions. Every instruction is classified into one of the pre-defined instruction formats. Each instruction includes an operator and zero or more operands. Instructions are grouped into basic blocks; basic blocks are constrained to having control-flow instructions at their end. Basic blocks fall-through to other basic blocks or contain branch instructions that have a destination basic block label. The graph of basic blocks is held in the cfg (control-flow graph) field of IR.

This section documents basic information about the intermediate represenation. For a tutorial based introduction to the material it is highly recommended that you read the presentation Jikes RVM Optimizing Compiler Intermediate Code Representation.

IR Operators

The IR operators are defined by the class Operators, which in turn is automatically generated from a template by a driver. The input to the driver are two files, both called OperatorList.dat. One input file resides in $RVM_ROOT/rvm/src-generated/opt-ir and defines machine-independent operators. The other resides in $RVM_ROOT/rvm/src-generated/opt-ir/$\{arch\} and defines machine-dependent operators, where $\{arch\} is the specific instruction architecture of interest.

Each operator in OperatorList.dat is defined by a five-line record, consisting of:

For example, the entry in OperatorList.dat that defines the integer addition operator is

INT_ADD 
Binary 
none 
<blank line> 
<blank line>

The operator for a conditional branch based on values of two references is defined by

REF_IFCOMP 
IntIfCmp 
branch | conditional 
<blank line> 
<blank line>

Additionally, the machine-specific OperatorList.dat file contains another line of information for use by the assembler. See the file for details.

Instruction Format

Every IR instruction fits one of the pre-defined Instruction Formats. The Java package org.jikesrvm.compilers.opt.ir defines roughly 75 architecture-independent instruction formats. For each instruction format, the package includes a class that defines a set of static methods by which optimizing compiler code can access an instruction of that format.

For example, INT_MOVE instructions conform to the Move instruction format. The following code fragment shows code that uses the Operators interface and the Move instruction format:

import org.jikesrvm.compilers.opt.ir.*; 
class X { 
  void foo(Instruction s) { 
   if (Move.conforms(s)) {    // if this instruction fits the Move format 
     RegisterOperand r1 = Move.getResult(s); 
     Operand r2 = Move.getVal(s); 
     System.out.println("Foundamoveinstruction:" + r1 + ":=" + r2); 
   } else { 
     System.out.println(s + "isnotaMOVE"); 
   } 
  } 
}

This example shows just a subset of the access functions defined for the Move format. Other static access functions can set each operand (in this case, Result and Val), query each operand for nullness, clear operands, create Move instructions, mutate other instructions into Move instructions, and check the index of a particular operand field in the instruction. See the JavadocTM reference for a complete description of the API.

Each fixed-length instruction format is defined in the text file $RVM_ROOT/rvm/src-generated/opt-ir/InstructionFormatList.dat. Each record in this file has four lines:

So for example, the record that defines the Move instruction format is

Move 
1 0 1 
"D Result RegisterOperand" "U Val Operand" 
<blank line>

This specifies that the Move format has two operands, one def and one use. The def is called Result and must be of type RegisterOperand. The use is called Val and must be of type Operand.

A few instruction formats have variable number of operands. The format for these records is given at the top of InstructionFormatList.dat. For example, the record for the variable-length Call instruction format is:

Call 
1 0 3 1 U 4 
"D Result RegisterOperand" \ 
"U Address Operand" "U Method MethodOperand" "U Guard Operand opt" 
"Param Operand"

This record defines the Call instruction format. The second line indicates that this format always has at least 4 operands (1 def and 3 uses), plus a variable number of uses of one other type. The trailing 4 on line 2 tells the template generator to generate special constructors for cases of having 1, 2, 3, or 4 of the extra operands. Finally, the record names the Call instruction operands and constrains the types. The final line specifies the name and types of the variable-numbered operands. In this case, a Call instruction has a variable number of (use) operands called Param. Client code can access the ith parameter operand of a Call instruction s by calling Call.getParam(s,i).

A number of instruction formats share operands of the same semantic meaning and name. For convenience in accessing like instruction formats, the template generator supports four common operand access types:

For example, for any instruction s that carries a Result operand (eg. Move, Binary, and Unary formats), client code can call ResultCarrier.conforms(s) and ResultCarrier.getResult(s) to access the Result operand.

Finally, a note on rationale. Religious object-oriented philosophers will cringe at the InstructionFormats. Instead, all this functionality could be implemented more cleanly with a hierarchy of instruction types exploiting (multiple) inheritance. We rejected the class hierarchy approach due to efficiency concerns of frequent virtual/interface method dispatch and type checks. Recent improvements in our interface invocation sequence and dynamic type checking algorithms may alleviate some of this concern.

13.2.3 BURS

The optimizing compiler uses the Bottom-Up Rewrite System (BURS) for instruction selection. BURS is essentially a tree pattern matching system derived from Iburg by David R. Hanson. (See ”Engineering a Simple, Efficient Code-Generator Generator” by Fraser, Hanson, and Proebsting, LOPLAS 1(3), Sept. 1992, doi: 10.1145/151640.151642). The instruction selection rules for each architecture are specified in an architecture-specific file located in $RVM_ROOT/rvm/src-generated/opt-burs/${arch}, where ${arch} is the specific instruction architecture of interest. The rules are used in generating a parser, which transforms the IR.

Each rule is defined by a four-line record, consisting of:

Each production has a non-terminal, which denotes a value, followed by a colon (”:”), followed by a dependence tree that produces that value. For example, the rule resulting in memory add on the Intel IA32 architecture is expressed in the following way:

stm:    INT_STORE(INT_ADD_ACC(INT_LOAD(r,riv),riv),OTHER_OPERAND(r, riv)) 
ADDRESS_EQUAL(P(p), PLL(p), 17) 
EMIT_INSTRUCTION 
EMIT(MIR_BinaryAcc.mutate(P(p), IA32_ADD, MO_S(P(p), DW), BinaryAcc.getValue(PL(p))));

The production in this rule represents the following tree:

        r    riv 
         \    / 
        INT_LOAD  riv 
            \    / 
          INT_ADD_ACC  r  riv 
                   \   |  / 
                  INT_STORE

where r is a non-terminal that represents a register or a tree producing a register, riv is a non-terminal that represents a register (or a tree producing one) or an immediate value, and INT_LOAD, INT_ADD_ACC and INT_STORE are operators (terminals). OTHER_OPERAND is just an abstraction to make the tree binary.

There are multiple helper functions that can be used in Java code (both cost expressions and generation templates). In all code sequences the name p is reserved for the current tree node. Some of the helper methods are shortcuts for accessing properties of tree nodes:

How the above rule basically reads is as follows: If a tree shown above is seen, evaluate the cost expression (which, in this case, calls a helper function to test whether the addresses in the STORE (P(p)) and the LOAD (PLL(p)) instructions are equal. The function returns 17 if they are, and a special value INFINITE if not), and if the cost is acceptable, emit the STORE instruction (P(p)) mutated in place into a machine-dependent add-accumulate instruction (IA32_ADD) that adds a given value to the contents of a given memory location.

The rules file is used to generate a file called ir.brg, which, in turn, is used to produce a file called BURS_STATE.java. Note that if the BURS rules are changed, it is necessary to run ant real-clean in order to recreate the auto-generated Java source code for the BURS rules.

For more information on helper functions look at BURS_Helpers.java. For more information on the BURS algorithm see BURS.java.

Future directions Whilst jburg allows us to do good instruction selection there are a number of areas where it is lacking, e.g. vector operations.

We can’t write productions for vector operations unless we match an entire tree of operations. For example, it would be nice to write a rule of the form:

(r, r): ADD(r,r), ADD(r,r)

if say the architecture supported a vector add operation (ie SIMD). Unfortunately we can’t have tuples on the LHS of expressions and the comma represents that matching two coverings is necessary. Leupers has shown how to achieve this result with a modified BURS system. Their syntax is:

r: ADD(r,r) 
r: ADD(r,r)

13.2.4 OptTestHarness

For optimizing compiler development, it is sometimes useful to exercise careful control over which classes are compiled, and with which optimization level. In many cases, a prototype-opt image will suit this process using the command line option -X:aos:initial_compiler=opt combined with -X:aos:enable_recompilation=false. This configuration invokes the optimizing compiler on each method run.The OptTestHarness provides even more control over the optimizing compiler. This driver program allows you to invoke the optimizing compiler as an ”application” running on top of the VM.


-useBootOptions

Use the same OptOptions as the bootimage compiler.

-longcommandline <filename>

Read commands (one per line) from a file

+baseline

Switch default compiler to baseline

-baseline

Switch default compiler to optimizing

-load <class>

Load a class

-class <class>

Load a class and compile all its methods

-method <class><method>[- or <descrip>]

Compile method with default compiler

-methodOpt <class><method>[- or <descrip>]

Compile method with opt compiler

-methodBase <class><method>[- or <descrip>]

Compile method with base compiler

-er <class><method>[- or <descrip>] {args}

Compile with default compiler and execute a method

-performance

Show performance results

-oc

pass an option to the optimizing compiler


Table 13.1: OptTestHarness command line options

Examples To use the OptTestHarness program:

rvm org.jikesrvm.tools.oth.OptTestHarness -class Foo

will invoke the optimizing compiler on all methods of class Foo.

rvm org.jikesrvm.tools.oth.OptTestHarness -method Foo bar -

will invoke the optimizing compiler on the first method bar of class Foo it loads.

rvm org.jikesrvm.tools.oth.OptTestHarness -method Foo bar ’(I)V;’

will invoke the optimizing compiler on method Foo.bar(I)V;. You can specify any number of -method and -class options on the command line. Any arguments passed to OptTestHarness via -oc will be passed on directly to the optimizing compiler. So:

rvm org.jikesrvm.tools.oth.OptTestHarness -oc:O1 -oc:print_final_hir=true -method Foo bar -

will compile Foo.bar at optimization level O1 and print the final HIR.