16 MMTk

Chapter 16
MMTk

The garbage collectors for Jikes RVM are provided by MMTk. The document MMTk: The Memory Manager Toolkit describes MMTk and gives a tutorial on how to use and edit it and is the best place to start. An updated version of the tutorial is available in this guide. A detailed description of the call chain from the compilers through to MMTk here is another good place to start understanding how MMTk integrates with Jikes RVM. Anatomy of a Garbage Collector describes the major building blocks of an MMTk collector and Scanning Objects in Jikes RVM describes how objects are scanned for their pointer ﬁelds during GC. MMTk also has a pure Java test harness that allows development of garbage collectors in an IDE like eclipse.

Jikes RVM can be conﬁgured to employ various diﬀerent allocation managers taken from the MMTk memory management toolkit. Managers divide the available space up as they see ﬁt. However, they normally subdivide the available address range to provide:

a metadata area which enables the manager to track the status of allocated and unallocated storage in the rest of the heap.
an immortal data area used to service allocations of objects which are expected to persist across the whole lifetime of the Jikes RVM runtime (e.g. the boot image)
a large object space used to service allocations of objects which are larger than some speciﬁed size (e.g. a virtual memory page) - the large object space may employ a diﬀerent allocation and reclamation strategy to that used for other objects.
a small object allocation area which may be divided into e.g.two semi spaces, a nursery space and a mature space, a set of generations, a non-relocatable buddy hierarchy etc depending upon the allocation and reclamation strategy employed by the memory manager.
separate spaces for code. These are designed to exclude performance problems that can occur on some architectures when code and data are mixed. See this paper for the original motivation and experiments for a separate code space.

Virtual memory pages are lazily mapped into Jikes RVM’s memory image as they are needed.

The main class which is used to interface to the memory manager is called Plan. Each ﬂavor of the manager is implemented by substituting a diﬀerent implementation of this class. Most plans inherit from class StopTheWorld which ensures that all active mutator threads (i.e. ones which do not perform the job of reclaiming storage) are suspended before reclamation is commenced. The argument passed to -X:gc:threads determines the number of parallel collector threads that will be used for collection.

Generational collectors employ a plan which inherits from class Generational. Inter alia, this class ensures that a write barrier is employed so that updates from old to new spaces are detected.

Jikes RVM may also use the GCSpy visualization framework. GCSpy allows developers to observe the behavior of the heap and related data structures.

16.1 Anatomy of a Garbage Collector

** Work in progress, contributions appreciated **

This page gives a brief outline of the major control ﬂows in the execution of a garbage collector in MMTk. For simplicity, we focus on the MarkSweep collector, although much of the discussion will be relevant to other collectors.

This page assumes you have a basic knowledge of garbage collection. For those that don’t, please see one of the standard texts such as The Garbage Collection Handbook.

16.1.1 Structure of a Plan

An MMTk Plan is required to provide 5 classes. They are required to have consistent names which start with the same name and have a suﬃx that indicates which class it inherits from. in the case of the MarkSweep plan, the name is ”MS”.

MS - this is a singleton class that is a subclass of org.mmtk.plan.Plan. This class encapsulates data structures that are shared among multiple threads.
MSMutator - subclass of org.mmtk.plan.MutatorContext. This class encapsulates data structures that are local to a single mutator thread. In the case of Jikes RVM, a Thread is actually a subclass of this class for eﬃciency reasons.
MSCollector - subclass of org.mmtk.plan.CollectorContext. This provides thread-local data structures speciﬁc to a garbage collector thread.
MSConstraints - subclass of org.mmtk.plan.PlanConstraints. This provides conﬁguration information that the host virtual machine might need. It is separated out from the Plan class in order to prevent circular class loading dependencies.
MSTraceLocal - subclass of org.mmtk.plan.TraceLocal. This provides thread-local data structures speciﬁc to a particular way of traversing the heap. In a simple collector like MarkSweep, there is only one of these classes, but in more complex collectors there may be several. For example, in a generational collector, there will be one TraceLocal class for a nursery collection, and another for a full-heap collection.

The basic architecture of MMTk is that virtual address space is divided into chunks (of 4MB in a 32-bit memory model) that are managed according to a speciﬁc policy. A policy is implemented by an instance of the Space class, and it is in the policy class that the mechanics of a particular mechanism (like mark-sweep) is implemented. The task of a Plan is to create the policy (Space) objects that manage the heap, and to integrate them into the MMTk framework. MMTk exposes some of this memory management policy to the host VM, by allowing the VM to specify an allocator (represented by a small integer) when allocating space. The interface exposed to the VM allows it to choose whether an object will move during collection or not, whether the object is large enough to require special handling etc. The MMTk plan is free (within the semantic guarantees exposed to the VM) to direct each of these allocators to a particular policy.

16.1.2 Policies

A policy describes how a range of virtual address space is managed. The base class of all policies is org.mmtk.policy.Space, and a particular instance of a policy is known generically as a space. The static initializer of a Plan and its subclasses deﬁne the spaces that make up an MMTk plan.

23cAp0x19-22300016.1.2: MS.java

public static final MarkSweepSpace msSpace = new MarkSweepSpace("ms", VMRequest.discontiguous());
public static final int MARK_SWEEP = msSpace.getDescriptor();

In this code fragment, we see the MS plan deﬁned. Note that we generally also deﬁne a static final space descriptor. This is an optimization that allows some rapid operations on spaces.

A Space is a global object, shared among multiple mutator threads. Each policy will also have one or more thread-local classes which provide unsynchronized allocation. These classes are subclasses of org.mmtk.utility.alloc.Allocator, and in the case of MarkSweep, it is called MarkSweepLocal. Instances of MarkSweepLocal are created as part of a mutator context, like this

23cAp1x19-22300016.1.2: MSMutator.java

protected MarkSweepLocal ms = new MarkSweepLocal(MS.msSpace);

The design pattern is that the local Allocator will allocate space from a thread-local buﬀer, and when that is exhausted it will allocate a new buﬀer from the global Space, performing appropriate locking. The constructor of the MarkSweepLocal speciﬁes the space from which the allocator will allocate global memory.

16.1.3 Allocation

MMTk provides two methods for allocating an object. These are provided by the MSMutator class, to give each plan the opportunity to use fast, unsynchronized thread-local allocation before falling back to a slower synchronized slow-path.

The version implemented in MarkSweep looks like this:

23cAp2x19-22400016.1.3: MSMutator.java

public Address alloc(int bytes, int align, int offset, int allocator, int site) {
  if (allocator == MS.ALLOC_DEFAULT) {
   return ms.alloc(bytes, align, offset);
  }
  return super.alloc(bytes, align, offset, allocator, site);
}

The basic structure of this method is common to all MMTk plans. First they decide whether the operation applies to this level of abstraction (if (allocator == MS.ALLOC_DEFAULT)), and if so, delegate to the appropriate place, otherwise pass it up the chain to the super-class. In the case of MarkSweep, MSMutator delegates the allocation to its thread-local MarkSweepLocal object ms.

The alloc method of MarkSweepLocal is inherited from SegregatedFreeListLocal (mark-sweep is not the only way of managing free-list allocation), and looks like this

23cAp3x19-22400016.1.3: SegregatedFreeListLocal.java (simpliﬁed)

public final Address alloc(int bytes, int align, int offset) {
  int sizeClass = getSizeClass(bytes);
  Address cell = freeList.get(sizeClass);
  if (!cell.isZero()) {
   freeList.set(sizeClass, cell.loadAddress());
   /* Clear the free list link */
   cell.store(Address.zero());
   return cell;
  }
  return allocSlow(bytes, align, offset);
}

This is a standard pattern for thread-local allocation: ﬁrst we look in the thread-local space (line 3), and if successful return the result (lines 4-8). If unsuccessful, we request space from the global policy via the method Allocator.allocSlow. This is the common interface that all Allocators use to request space from the global policy. This will eventually call the allocator-speciﬁc allocSlowOnce method. The workings of the allocSlowOnce method are very policy-speciﬁc, so not appropriate to look at at this stage, but eventually all policies will attempt to acquire fresh virtual memory via the Space.acquire method.

Space.acquire is the only correct way for a policy to allocate new virtual memory for its own use.

23cAp4x19-22400016.1.3: Space.java (simpliﬁed)

public final Address acquire(int pages) {
  pr.reservePages(pages);
  // Poll, either fixing budget or requiring GC
  if (VM.activePlan.global().poll(false, this)) {
   VM.collection.blockForGC();
   return Address.zero(); // GC required, return failure
  }
  // Page budget is ok, try to acquire virtual memory
  Address rtn = pr.getNewPages(pagesReserved, pages, zeroed);
  if (rtn.isZero()) {  // Failed, so force a GC
   boolean gcPerformed = VM.activePlan.global().poll(true, this);
   VM.collection.blockForGC();
   return Address.zero();
  }
  return rtn;
}

The logic of space.acquire is:

First, poll the plan to ﬁnd out whether the heap is full. This logic is performed by the plan, because it has knowledge of copy reserves etc.
The poll method will request a GC if required, and return true if it has done so.
Then we wait for GC if required. poll can’t wait, because it is called in circumstances that aren’t GC safe.
If Plan.poll(...) returns false (we are within the allowed heap size), we call pr.getNewPages to allocate virtual memory. At this stage we can ﬁnd that we have run out of virtual memory, and if so, we force a GC
If a GC is performed, we return Address.zero(), rather than retrying locally. In many plans, the next allocation request will be satisﬁed by re-using space in a page that already belongs to a policy, so the post-GC allocation must be performed further up in the call stack. The retry logic is handled in Allocator.allocSlowInline.

23cAp5x19-22400016.1.3: Allocator.java (simpliﬁed)

public final Address allocSlowInline(int bytes, int alignment, int offset) {
  boolean emergencyCollection = false;
  while (true) {
   Address result = allocSlowOnce(bytes, alignment, offset);
   if (!result.isZero()) {
     return result;
   }
   if (emergencyCollection) {
     VM.collection.outOfMemory();
   }
   emergencyCollection = Plan.isEmergencyCollection();
  }
}

This code fragment shows the retry logic in the allocator. We try allocating using allocSlowOnce, which may recycle partially-used blocks and eventually call Space.acquire. If a GC occurred, we try again. Eventually the plan will request an emergency collection which will (for example) cause soft references to be dropped. If this fails we throw an OutOfMemoryError.

16.1.4 Collection

Scheduling

In a stop-the-world garbage collector like MarkSweep, the mutator threads run until memory is exhausted, then all mutator threads are suspended, the collector threads are activated, and they perform a garbage collection. After the GC is complete, the collector threads are suspended and the mutator threads resume. MMTk also has some support for concurrent collectors, in which one or more collector threads can be scheduled to run alongside the mutator, either exclusively or in addition to (hopefully briefer) stop-the-world phases.

Thread scheduling in MMTk is handled by a GC controller thread, implemented in the singleton class org.mmtk.plan.ControllerCollectorContext held in the static ﬁeld Plan.controlCollectorContext. Whenever a collection is initiated, it is done by calling methods on this object.

Initiating

As mentioned above, every attempt to allocate fresh virtual memory calls the current plan’s poll(...) method. This initiates a GC by calling controlCollectorContext.request(), which in a stop-the-world collector like MarkSweep pauses the mutator threads and then wakes the collector threads. The main loop of the garbage collector is simply the run() method of ParallelCollector, shown below.

23cAp6x19-22700016.1.4: ParallelCollector

public void run() {
  while(true) {
   park();
   collect();
  }
}

The collect() method is speciﬁc to the type of collector, and in StopTheWorldCollector it looks like this

23cAp7x19-22700016.1.4: StopTheWorldCollector

public void collect() {
Phase.beginNewPhaseStack(Phase.scheduleComplex(global().collection));
}

Collector Phases

Every garbage collection consists of a series of steps. Each step is either executed once (e.g. updating the mark state before marking the heap), or in parallel on all available collector threads (e.g. the parallel mark phase). The actual work of a step is done by the collectionPhase method of the global, collector or mutator class of a plan.

In early versions of MMTk, the main collection method was a template method, calling individual methods for each phase of the collection. As the number of collectors in MMTk grew, this became unwieldy and has been replaced with a conﬁgurable mechanism of phases.

The class org.mmtk.plan.Simple deﬁnes the basic structure of most of MMTk’s garbage collectors. First it deﬁnes the phases themselves,

23cAp8x19-22800016.1.4: Simple.java

public static final short SET_COLLECTION_KIND = Phase.createSimple("set-collection-kind", null);
public static final short INITIATE = Phase.createSimple("initiate", null);
public static final short PREPARE = Phase.createSimple("prepare");
...

Each phase of the collection is represented by a 16-bit integer, an index into a table of Phase objects. Simple phases are scheduled, and combined into sequences, or complex phases.

23cAp9x19-22800016.1.4: Simple.java

/** Ensure stacks are ready to be scanned */
protected static final short prepareStacks = Phase.createComplex("prepare-stacks", null,
Phase.scheduleMutator (PREPARE_STACKS),
Phase.scheduleGlobal (PREPARE_STACKS));

A simple phase can be scheduled in one of 4 ways:

Global. One collector thread is chosen to run the collectionPhase method of the global Plan object.
Collector. All collector threads run collectionPhase of the plan’s CollectorContext object(s).
Mutator. The collector threads run in parallel and iterate over the available MutatorContext objects (ie the mutator threads), and run the mutator’s collectionPhase method. Note that the collector threads are performing work on a per-mutator basis, because in general the mutator threads are stopped at this point.
Concurrent. The controller is requested to start a concurrent collectcor thread.

Between every phase of a collection, the collector threads rendezvous at a synchronization barrier. The actual execution of a collector’s phases is done in the method Phase.processPhaseStack. This method handles resuming a concurrent collection as well as running a full stop-the-world collection.

The actual work of a collection phase is done (as mentioned above) in the collectionPhase method of the major Plan classes.

23cAp10x19-22800016.1.4: MS.java

@Inline
@Override
public void collectionPhase(short phaseId) {
  if (phaseId == PREPARE) {
   super.collectionPhase(phaseId);
   msTrace.prepare();
   msSpace.prepare(true);
   return;
  }
  if (phaseId == CLOSURE) {
   msTrace.prepare();
   return;
  }
  if (phaseId == RELEASE) {
   msTrace.release();
   msSpace.release();
   super.collectionPhase(phaseId);
   return;
  }
  super.collectionPhase(phaseId);
}

This excerpt shows how the global MS plan implements collectionPhase, illustrating the key phases of a simple stop-the-world collector. The prepare phase performs tasks such as changing the mark state, the closure phase performs a transitive closure over the heap (the mark phase of a mark-sweep algorithm) and the release phase performs any post-collection steps. Where possible, a plan is structured so that each layer of inheritance deals only with the objects it creates, i.e. the MS class operates on the msSpace and delegates work on all other spaces to the super-class where they are deﬁned. By convention the PREPARE phase is performed outside-in (super-class preparation ﬁrst) and RELEASE is done inside-out (local ﬁrst, super-class second).

Tracing the heap

The main operation of a tracing collector is the transitive closure operation where all (or a subset) of the object graph is visited. Some collectors such as generational collectors perform these operations in more than one way, e.g. a nursery collection in a generational collector does not trace through pointers into the mature space, while a full-heap collection does. All MMTk collectors are designed to run using several parallel threads, using data structures that have unsynchronized thread-local and synchronized global components in the same way as MMTk’s policy classes.

MMTk’s trace operation uses the following terminology:

An edge is a reference in the heap from one reference ﬁeld to the object (or node) it points to.
Tracing an object is the policy-deﬁned operation performed by the collector on an object. In a mark-sweep policy this means setting the mark state of the object. In a copying policy this means moving the object to its new location.
Scanning is the process of identifying the reference ﬁelds of an object and processing the objects reachable from each of them.

Each distinct transitive closure operation is deﬁned as a subclass of TraceLocal. The closure is performed in the collectionPhase method of the plan-speciﬁc CollectorContext class

23cAp11x19-22900016.1.4: MSCollector.java

public void collectionPhase(short phaseId, boolean primary) {
  ...
  if (phaseId == MS.CLOSURE) {
   fullTrace.completeTrace();
   return;
  }
  ...
}

The initial starting point for the closure is computed by the STACK_ROOTS and ROOTS phases, which add root locations to a buﬀer by calling TraceLocal.reportDelayedRootEdge. The closure operation proceeds by invoking traceObiect on each root location (in method processRootEdge), and then invoking scanObject on each heap object encountered. Note that the CLOSURE operation is performed multiple times in each GC, due to processing of reference types.

16.2 Memory Allocation in Jikes RVM

The way that objects are allocated in Jikes RVM can be diﬃcult to grasp for someone new to the code base. This document provides a detailed look at some of the paths through the JikesRVM - MMTk interface code to help bootstrap understanding of the process. The process and code illustrated below is current as of March 2011, svn revision 16052 (between JikesRVM 3.1.1 and 3.1.2).

16.2.1 Memory Manager Interface

The best starting place to understand the allocation sequence is in the class org.jikesrvm.mm.mminterface.MemoryManager, which is a facade class for the MMTk allocators. MMTk provides a variety of memory management plans which are designed to be independent of the actual language being implemented. The MemoryManager class orchestrates the services of MMTk to allocate memory, and adds the structure necessary to make the allocated memory into Java objects.

The method allocateScalar is where all scalar (ie non-array) objects are allocated. The parameters of this method specify the object to be allocated in suﬃcient detail that when this method is compiled by the opt compiler, all of the parameters are compile-time constants, allowing maximum optimization. Working through the body of the method,

Selected.Mutator mutator = Selected.Mutator.get();

As mentioned above, MMTk provides many diﬀerent memory management plans, one of which is selected at build time. This call acquires a pointer to the thread-local per-mutator component of MMTk. Much of MMTk’s performance comes from providing unsynchronized thread-local data structures for the frequently used operations, so rather than provide a single interface object, it provides a per-thread interface object for both mutator and collector threads.

allocator = mutator.checkAllocator(org.jikesrvm.runtime.Memory.alignUp(size, MIN_ALIGNMENT), align, allocator);

An MMTk plan in general provides several spaces where objects can be allocated, each with their own characteristics. Jikes RVM is free to request allocation in any of these spaces, but sometimes there are constraints only available on a per-allocation basis that might force MMTk to override Jikes RVM’s request. For example, Jikes RVM may specify that objects allocated by a particular class are allocated in MMTk’s non-moving space. At execution time, one such object may turn out to be too large for allocation in the general non-moving space provided by that particular plan, and so MMTk needs to promote the object to the Large Object Space (LOS), which is also non-moving, but has high space overheads. This call will generally compile down to 0 or a small handful of instructions.

Address region = allocateSpace(mutator, size, align, offset, allocator, site);

This calls a method of MemoryManager, common to all allocation methods (for Arrays and other special objects), that calls

Address region = mutator.alloc(bytes, align, offset, allocator, site);

to actually allocate memory from the current MMTk plan.

Object result = ObjectModel.initializeScalar(region, tib, size);

Now we call the Jikes RVM object model to initialize the allocated region as a scalar object, and then

mutator.postAlloc(ObjectReference.fromObject(result), ObjectReference.fromObject(tib), size, allocator);

we call MMTk’s postAlloc method to perform initialization that can only be performed after an object has been initialized by the virtual machine.

16.2.2 Compiler integration

The allocateScalar method discussed above is only actually called from one place, the method resolvedNewScalar(int ...) in the class org.jikesrvm.runtime.RuntimeEntrypoints. This class provides methods that are accessed directly by the compilers, via ﬁelds in the org.jikesrvm.runtime.Entrypoints class. The ’resolved’ part of the method name indicates that the class of object being allocated is resolved at compile time (recall that the Java Language Spec requires that classes are only loaded, resolved etc when they are needed - sometimes it’s necessary to compile code that performs classloading and then allocate the object).

RuntimeEntrypoints also contains an overload, resolvedNewScalar(RVMClass), that is used by the reﬂection API to allocate objects. It’s instructive to look at this method, as it performs essentially the same operations as the compiler when compiling the call to resolvedNewScalar(int...).

Working backwards from this point requires delving into the individual compilers.

Baseline Compiler

There is a diﬀerent baseline compiler for each architecture. The relevant code in the baseline compiler for the ia32 architecture is in the class org.jikesrvm.compilers.baseline.ia32.BaselineCompilerImpl. The method e-mit_resolved_new(RVMClass) is responsible for generating code to execute the new bytecode when the target class is already resolved. Looking at this method, you can see it does essentially what the resolvedNewScalar(RVMClass) method in RuntimeEntrypoints does, then generates machine code to perform the call to the resolvedNewScalar entrypoint. Note how the work of calculating the size, alignment etc of the object is performed by the compiler, at compile time.

Similar code exists in the PPC baseline compiler.

Optimizing Compiler

The optimizing compiler is paradoxically somewhat simpler than the baseline compiler, in that injection of the call to the entrypoint is done in an architecture independent level of compiler IR. (An overview of the Jikes RVM optimizing compiler can be found in the paper The Jalapeño Dynamic Optimizing Compiler for Java).

In HIR (the high-level Intermediate Representation), allocation is expressed as a ’new’ opcode. During the translation from HIR to LIR (Low-level IR), this and other opcodes are translated into instructions by the class org.jikesrvm.compilers.opt.hir2lir.ExpandRuntimeServices. The method perform(IR) performs this translation, selecting particular operations via a large switch statement. The NEW_opcode case performs the task we’re interested in, doing essentially the same job as the baseline compiler, but generating IR rather than machine instructions. The compiler generates a ’call’ operation, and then (if the compilation policy decides it’s required) inlines it.

At this point in code generation, all the methods called by RuntimeEntrypoints.resolvedNewScalar(int...) which are annotated @Inline are also inlined into the current method. This inlining extends through to the MMTk code so that the allocation sequence can be optimized down to a handful of instructions.

It can be instructive to look at the various levels of IR generated for object allocation using a simple test program and the OptTestHarness utility described elsewhere in the user guide.

16.3 Scanning Objects in Jikes RVM

One of the services that MMTk expects a virtual machine to perform on its behalf is the scanning of objects, i.e. identifying and processing the pointer ﬁelds of the live objects it encounters during collection. In principle the implementation of this interface is simple, but there are two moderately complex optimizations layered on top of this.

From MMTk’s point of view, each time an object requires scanning it passes it to the VM, along with a TransitiveClosure object. The VM is expected to identify the pointers and invoke the processEdge method on each of the pointer ﬁelds in the object. The rationale for the current object scanning scheme is presented in this paper.

16.3.1 JikesRVM to MMTk Interface

MMTk requires its host virtual machine to provide an implementation of the class org.mmtk.vm.Scanning as its interface to scanning objects. Jikes RVM’s implementation of this class is found under the source tree MMTk/ext/vm/jikesrvm, in the class org.jikesrvm.mm.mmtk.Scanning. The methods we are interested in are scanObject(TransitiveClosure, ObjectReference) and specializedScanObject(int, TransitiveClosure, ObjectReference).

In MMTk, each plan deﬁnes one or more TransitiveClosure operations. Simple full-heap collectors like MarkSweep only deﬁne one TransitiveClosure, but complex plans like GenImmix or the RefCount plans deﬁne several. MMTk allows the plan to request specialized scanning on a closure-by-closure basis, closures that are specialized call specializedScanObject while unspecialized ones call scanObject. Specialization is covered in more detail below.

In the absence of hand-inlined scanning, or if specialization is globally disabled, scanning reverts to the fallback method in org.jikesrvm.mm.mminterface.SpecializedScanMethod. This method can be regarded as the basic underlying mechanism, and is worth understanding in detail.

RVMType type = ObjectModel.getObjectType(objectRef.toObject());
int[] offsets = type.getReferenceOffsets();

This code fetches the array of oﬀsets that Jikes RVM uses to identify the pointer ﬁelds in the object. This array is constructed by the classloader when a class is resolved.

if (offsets != REFARRAY_OFFSET_ARRAY) {
  for(int i=0; i < offsets.length; i++) {
   trace.processEdge(objectRef, objectRef.toAddress().plus(offsets[i]));
  }

One distinguished value (actually null) is used to identify arrays of reference objects, and this block of code scans scalar objects by tracing each of the ﬁelds at the oﬀsets given by the oﬀset array.

} else {
   for(int i=0; i < ObjectModel.getArrayLength(objectRef.toObject()); i++) {
   trace.processEdge(objectRef, objectRef.toAddress().plus(i << LOG_BYTES_IN_ADDRESS));
  }
}

The other case is reference arrays, for which we fetch the array length and scan each of the elements.

The internals of trace.processEdge vary by collector and by collection type (e.g. nursery/full-heap in a generational collector), and the details need not concern us here.

16.3.2 Hand Inlining

Hand inlining was introduced in February 2011, and uses a cute technique to encode 3 bits of metadata into the TIB pointer in an object’s header. The 7 most frequent object patterns are encoded into these bits, and then special-case code is written for each of them.

Hand inlining produces an average-case speedup slightly better than specialization, but performs poorly on some benchmarks. This is why we use it in combination with specialization.

16.3.3 Specialized Scanning

Specialized Scanning was introduced in September 2007. It speeds up GC by removing the process of fetching and interpreting the oﬀset array that describes each object, by jumping directly to a hard-coded method for scanning objects with a particular pattern.

The departure point from ”standard” java into the specialized scanning method is SpecializedScanMethod.invoke(...), which looks like this

@SpecializedMethodInvoke
@NoInline
public static void invoke(int id, Object object, TransitiveClosure trace) {
/* By default we call a non-specialized fallback */
fallback(object, trace);
}

The @SpecializedMethodInvoke annotation signals to the compiler that it should dispatch to one of the specialized method slots in the TIB.

Creation of specialized methods is handled by the class org.jikesrvm.classloader.SpecializedMethodManager.

16.4 Using GCSpy

16.4.1 The GCspy Heap Visualisation Framework

GCspy is a visualisation framework that allows developers to observe the behaviour of the heap and related data structures. For details of the GCspy model, see GCspy: An adaptable heap visualisation frameworkby Tony Printezis and Richard Jones, OOPSLA’02. The framework comprises two components that communicate across a socket: a client and a server incorporated into the virtual machine of the system being visualised. The client is usually a visualiser (written in Java) but the framework also provides other tools (for example, to store traces in a compressed ﬁle). The GCspy server implementation for Jikes RVM was contributed by Richard Jones of the University of Kent.

GCspy is designed to be independent of the target system. Instead, it requires the GC developer to describe their system in terms of four GCspy abstractions: spaces, streams, tiles and events. This description is transmitted to the visualiser when it connects to the server.

A space is an abstraction of a component of the system; it may represent a memory region, a free-list, a remembered-set or whatever. Each space is divided into a number of blocks which are represented by the visualiser as tiles. Each space will have a number of attributes – streams – such as the amount of space used, the number of objects it contains, the length of a free-list and so on.

In order to instrument a Jikes RVM collector with GCspy:

Provide a startGCspyServer method in that collector’s plan. That method initialises the GCspy server with the port on which to communicate and a list of event names, instantiates drivers for each space, and then starts the server.
Gather data from each space for the tiles of each stream (e.g. before, during and after each collection).
Provide a driver for each space.

Space drivers handle communication between collectors and the GCspy infrastructure by mapping information collected by the memory manager to the space’s streams. A typical space driver will:

Create a GCspy space.
Create a stream for each attribute of the space.
Update the tile statistics as the memory manager passes it information.
Send the tile data along with any summary or control information to the visualiser.

The Jikes RVM SSGCspy plan gives an example of how to instrument a collector. It provides GCspy spaces, streams and drivers for the semi-spaces, the immortal space and the large object space, and also illustrates how performance may be traded for the gathering of more detailed information.

16.4.2 Installation of GCspy with Jikes RVM

Building GCSpy

The GCspy client code makes use of the Java Advanced Imaging (JAI) API. The build system will attempt to download and install the JAI component when required but this is only supported on the ia32-linux platform. The build system will also attempt to download and install the GCSpy server when required.

Building Jikes RVM to use GCspy

To build the Jikes RVM with GCSpy support the conﬁguration parameter config.include.gcspy must be set to true such as in the BaseBaseSemiSpaceGCspy conﬁguration. You can also have the Jikes RVM build process create a script to start the GCSpy client tool if GCSpy was built with support for client component. To achieve this the conﬁguration parameter config.include.gcspy-client must be set to true.

The following steps build the Jikes RVM with support for GCSpy on linux-ia32 platform.

$ cd $RVM_ROOT
$ ant -Dhost.name=ia32-linux -Dconfig.name=BaseBaseSemiSpaceGCspy -Dconfig.include.gcspy-client=1

It is also possible to build the Jikes RVM with GCSpy support but link it against a fake stub implementation rather than the real GCSpy implementation. This is achieved by setting the conﬁguration parameter config.include.gcspy-stub to true. This is used in the nightly testing process.

Running Jikes RVM with GCspy

To start Jikes RVM with GCSpy enabled you need to specify the port the GCSpy server will listen on.

$ cd $RVM_ROOT/dist/BaseBaseSemiSpaceGCspy_ia32-linux
$ ./rvm -Xms20m -X:gc:gcspyPort=3000 -X:gc:gcspyWait=true &

Then you need to start the GCspy visualiser client.

$ cd $RVM_ROOT/dist/BaseBaseSemiSpaceGCspy_ia32-linux
$ ./tools/gcspy/gcspy

After this you can specify the port and host to connect to (i.e. localhost:3000) and click the ”Connect” button in the bottom right-hand corner of the visualiser.

16.4.3 Command line arguments

Additional GCspy-related arguments to the rvm command:

-X:gc:gcspyPort=<port>
The number of the port on which to connect to the visualiser. The default is port 0, which signiﬁes no connection.
-X:gc:gcspyWait=<true|false>
Whether Jikes RVM should wait for a visualiser to connect.
-X:gc:gcspyTilesize=<size>
How many KB are represented by one tile. The default value is 128.

16.4.4 Writing GCspy drivers

To instrument a new collector with GCspy, you will probably want to subclass your collector and to write new drivers for it. The following sections explain the modiﬁcations you need to make and how to write a driver. You may use org.mmtk.plan.semispace.gcspy and its drivers as an example.

The recommended way to instrument a Jikes RVM collector with GCspy is to create a gcspy subdirectory in the directory of the collector being instrumented, e.g. MMTk/src/org/mmtk/plan/semispace/gcspy. In that directory, we need 5 classes:

SSGCspy,
SSGCspyCollector,
SSGCspyConstraints,
SSGCspyMutator and
SSGCspyTraceLocal.

SSGCspy is the plan for the instrumented collector. It is a subclass of SS.

SSGCspyConstraints extends SSConstraints to provide methods boolean needsLinearScan() and boolean withGCspy(), both of which return true.

SSGCspyTraceLocal extends SSTraceLocal to override methods traceObject and willNotMove to ensure that tracing deals properly with GCspy objects: the GCspyTraceLocal ﬁle will be similar for any instrumented collector.

The instrumented collector, SSGCspyCollector, extends SSCollector. It needs to override collectionPhase.

Similarly, SSGCspyMutator extends SSMutator and must also override its parent’s methods collectionPhase, to allow the allocators to collect data; and its alloc and postAlloc methods to allocate GCspy objects in GCspy’s heap space.

The Plan

SSGCspy.startGCspyServer is called immediately before the ”main” method is loaded and run. It initialises the GCspy server with the port on which to communicate, adds event names, instantiates a driver for each space, and then starts the server, forcing the VM to wait for a GCspy to connect if necessary. This method has the following responsibilities.

Initialise the GCspy server: server.init(name, portNumber, verbose);
Add each event to the ServerInterpreter (‘server’ for short) server.addEvent(eventID, eventName);
Set some general information about the server (e.g. name of the collector, build, etc) server.setGeneralInfo(info);
Create new drivers for each component to be visualised myDriver = new MyDriver(server, args...);

Drivers extend AbstractDriver and register their space with the ServerInterpreter. In addition to the server, drivers will take as arguments the name of the space, the MMTk space, the tilesize, and whether this space is to be the main space in the visualiser.

The Collector and Mutator

Instrumenters will typically want to add data collection points before, during and after a collection by overriding collectionPhase in SSGCspyCollector and SSGCspyMutator.

SSGCspyCollector deals with the data in the semi-spaces that has been allocated there (copied) by the collector. It only does any real work at the end of the collector’s last tracing phase, FORWARD_FINALIZABLE.

SSGCspyMutator is more complex: as well as gathering data for objects that it allocated in From-space at the start of the PREPARE_MUTATOR phase, it also deals with the immortal and large object spaces.

At a collection point, the collector or mutator will typically

Return if the GCspy port number is 0 (as no client can be connected).
Check whether the server is connected at this event. If so, the compensation timer (which discounts the time taken by GCspy to ather the data) should be started before gathering data and stopped after it.
After gathering the data, have each driver call its transmit method.
SSGCspyCollector does not call the GCspy server’s serverSafepoint method, as the collector phase is usually followed by a mutator phase. Instead, serverSafepoint can be called by SSGCspyMutator to indicate that this is a point at which the server can pause, play one event, etc.

Gathering data will vary from MMTk space to space. It will typically be necessary to resize a space before gathering data. For a space,

We may need to reset the GCspy driver’s data depending on the collection phase.
We will pass the driver as a call-back to the allocator. The allocator will typically ask the driver to set the range of addresses from which we want to gather data, using the driver’s setRange method. The driver should then iterate through its MMTk space, passing a reference to each object found to the driver’s scan method.

The Driver

GCspy space drivers extend AbstractDriver. This class creates a new GCspy ServerSpace and initializes the control values for each tile in the space. Control values indicate whether a tile is used, unused, a background, a separator or a link. The constructor for a typical space driver will:

Create a GCspy Stream for each attribute of a space.
Initialise the tile statistics in each stream.

Some drivers may also create a LinearScan object to handle call-backs from the VM as it sweeps the heap (see above).

The chief roles of a driver are to accumulate tile statistics, and to transmit the summary and control data and the data for all of their streams. Their data gathering interface is the scan method (to which an object reference or address is passed).

When the collector or mutator has ﬁnished gathering data, it calls the transmit of the driver for each space that needs to send its data. Streams may send values of types byte, \spverbshort+ or int, implemented through classes ByteStream, ShortStream or IntStream. A driver’s transmit method will typically:

Determine whether a GCspy client is connected and interested in this event, e.g. server.isConnected(event)
Setup the summaries for each stream, e.g. stream.setSummary(values...);
Setup the control information for each tile. e.g.
controlValues(CONTROL_USED, start, numBlocks);
controlValues(CONTROL_UNUSED, end, remainingBlocks);
Set up the space information, e.g. setSpace(info);
Send the data for all streams, e.g. send(event, numTiles);

Note that AbstractDriver.send takes care of sending the information for all streams (including control data).

Subspaces

Subspace provides a useful abstraction of a contiguous region of a heap, recording its start and end address, the index of its ﬁrst block, the size of blocks in this space and the number of blocks in the region. In particular, Subspace provides methods to:

Determine whether an address falls within a subspace;
Determine the block index of the address;
Calculate how much space remains in a block after a given address;

Jikes RVM

Resources

Documentation

Project Information

Chapter 16
MMTk

16.1 Anatomy of a Garbage Collector

16.1.1 Structure of a Plan

16.1.2 Policies

16.1.3 Allocation

16.1.4 Collection

Scheduling

Initiating

Collector Phases

Tracing the heap

16.2 Memory Allocation in Jikes RVM

16.2.1 Memory Manager Interface

16.2.2 Compiler integration

Baseline Compiler

Optimizing Compiler

16.3 Scanning Objects in Jikes RVM

16.3.1 JikesRVM to MMTk Interface

16.3.2 Hand Inlining

16.3.3 Specialized Scanning

16.4 Using GCSpy

16.4.1 The GCspy Heap Visualisation Framework

16.4.2 Installation of GCspy with Jikes RVM

Building GCSpy

Building Jikes RVM to use GCspy

Running Jikes RVM with GCspy

16.4.3 Command line arguments

16.4.4 Writing GCspy drivers

The Plan

The Collector and Mutator

The Driver

Subspaces

Jikes RVM

Resources

Documentation

Project Information

Chapter 16MMTk

16.1 Anatomy of a Garbage Collector

16.1.1 Structure of a Plan

16.1.2 Policies

16.1.3 Allocation

16.1.4 Collection

Scheduling

Initiating

Collector Phases

Tracing the heap

16.2 Memory Allocation in Jikes RVM

16.2.1 Memory Manager Interface

16.2.2 Compiler integration

Baseline Compiler

Optimizing Compiler

16.3 Scanning Objects in Jikes RVM

16.3.1 JikesRVM to MMTk Interface

16.3.2 Hand Inlining

16.3.3 Specialized Scanning

16.4 Using GCSpy

16.4.1 The GCspy Heap Visualisation Framework

16.4.2 Installation of GCspy with Jikes RVM

Building GCSpy

Building Jikes RVM to use GCspy

Running Jikes RVM with GCspy

16.4.3 Command line arguments

16.4.4 Writing GCspy drivers

The Plan

The Collector and Mutator

The Driver

Subspaces

Chapter 16
MMTk