IR
The optimizing compiler intermediate representation (IR) is
held in an object of type
IR and includes a list of instructions. Every
instruction is classified into one of the pre-defined instruction
formats. Each instruction includes an operator and zero or more
operands. Instructions are grouped into basic blocks; basic
blocks are constrained to having control-flow instructions at
their end. Basic blocks fall-through to other basic blocks or
contain branch instructions that have a destination basic block
label. The graph of basic blocks is held in the
cfg (control-flow graph) field of IR.
This section documents basic information about the intermediate instruction. For a tutorial based introduction to the material it is highly recommended that you read " Jikes RVM Optimizing Compiler Intermediate Code Representation".
IR Operators
The IR operators are defined by the class
Operators, which in turn is automatically generated
from a template by a driver. The input to the driver are two
files, both called
OperatorList.dat. One input file resides in
$RVM_ROOT/rvm/src-generated/opt-ir and defines
machine-independent operators. The other resides in
$RVM_ROOT/rvm/src-generated/opt-ir/${arch} and
defines machine-dependent operators, where
${arch} is the specific instruction architecture of
interest.
Each operator in
OperatorList.dat is defined by a five-line record,
consisting of:
-
SYMBOL: a static symbol to identify the operator -
INSTRUCTION_FORMAT: the instruction format class that accepts this operator. -
TRAITS: a set of characteristics of the operator, composed with a bit-wise or ( |) operator. SeeOperator.javafor a list of valid traits. -
IMPLDEFS: set of registers implicitly defined by this operator; usually applies only to machine-dependent operators -
IMPLUSES: set of registers implicitly used by this operator; usually applies only to machine-dependent operators
For example, the entry in
OperatorList.dat that defines the integer addition
operator is
INT_ADD
Binary
none
<blank line>
<blank line>
The operator for a conditional branch based on values of two references is defined by
REF_IFCOMP
IntIfCmp
branch | conditional
<blank line>
<blank line>
Additionally, the machine-specific
OperatorList.dat file contains another line of
information for use by the assembler. See the file for
details.
Instruction Formats
Every IR instruction fits one of the pre-defined
Instruction Formats. The Java package
org.jikesrvm.compilers.opt.ir defines roughly 75
architecture-independent instruction formats. For each
instruction format, the package includes a class that defines a
set of static methods by which optimizing compiler code can
access an instruction of that format.
For example,
INT_MOVE instructions conform to the
Move instruction format. The following code fragment
shows code that uses the
Operators interface and the
Move instruction format:
import org.jikesrvm.compilers.opt.ir.*;
class X {
void foo(Instruction s) {
if (Move.conforms(s)) { // if this instruction fits the
Move format
RegisterOperand r1 = Move.getResult(s);
Operand r2 = Move.getVal(s);
System.out.println("Found a move instruction: " + r1 + " := "
+ r2);
} else {
System.out.println(s + " is not a MOVE");
}
}
}
This example shows just a subset of the access functions
defined for the Move format. Other static access functions can
set each operand (in this case,
Result and
Val), query each operand for nullness, clear
operands, create Move instructions, mutate other instructions
into Move instructions, and check the index of a particular
operand field in the instruction. See the Javadoc
™ reference for a
complete description of the API.
Each fixed-length instruction format is defined in the text
file
$RVM_ROOT/rvm/src-generated/opt-ir/InstructionFormatList.dat.
Each record in this file has four lines:
-
NAME: the name of the instruction format -
SIZES: the number of operands defined, defined and used, and used -
SIG: a description of each operand, each description given by-
D/DU/U: Is this operand a def, use, or both? -
NAME: the unique name to identify the operand -
TYPE: the type of the operand (a subclass ofOperand) -
[opt]: is this operand optional?
-
-
VARSIG: a description of repeating operands, used for variable-length instructions.
So for example, the record that defines the
Move instruction format is
Move
1 0 1
"D Result RegisterOperand" "U Val Operand"
<blank line>
This specifies that the
Move format has two operands, one def and one use.
The def is called
Result and must be of type
RegisterOperand. The use is called
Val and must be of type
Operand.
A few instruction formats have variable number of operands.
The format for these records is given at the top of
InstructionFormatList.dat. For example, the record
for the variable-length
Call instruction format is:
Call
1 0 3 1 U 4
"D Result RegisterOperand" \
"U Address Operand" "U Method MethodOperand" "U Guard Operand opt"
"Param Operand"
This record defines the
Call instruction format. The second line indicates
that this format always has at least 4 operands (1 def and 3
uses), plus a variable number of uses of one other type. The
trailing 4 on line 2 tells the template generator to generate
special constructors for cases of having 1, 2, 3, or 4 of the
extra operands. Finally, the record names the
Call instruction operands and constrains the types.
The final line specifies the name and types of the
variable-numbered operands. In this case, a
Call instruction has a variable number of (use)
operands called
Param. Client code can access the ith parameter
operand of a
Call instruction
s by calling
Call.getParam(s,i).
A number of instruction formats share operands of the same semantic meaning and name. For convenience in accessing like instruction formats, the template generator supports four common operand access types:
-
ResultCarrier: provides access to an operand of typeRegisterOperandnamedResult. -
GuardResultCarrier: provides access to an operand of typeRegisterOperandnamedGuardResult. -
LocationCarrier: provides access to an operand of typeLocationOperandnamedLocation. -
GuardCarrier: provides access to an operand of typeOperandnamedGuard.
For example, for any instruction
s that carries a
Result operand (eg.
Move,
Binary, and
Unary formats), client code can call
ResultCarrier.conforms(s) and
ResultCarrier.getResult(s) to access the
Result operand.
Finally, a note on rationale. Religious object-oriented philosophers will cringe at the InstructionFormats. Instead, all this functionality could be implemented more cleanly with a hierarchy of instruction types exploiting (multiple) inheritance. We rejected the class hierarchy approach due to efficiency concerns of frequent virtual/interface method dispatch and type checks. Recent improvements in our interface invocation sequence and dynamic type checking algorithms may alleviate some of this concern.