Note 10, Programming Language Concepts (sestoft@dina.kvl.dk) 2002-04-15 ----------------------------------------------------------------------- Abstract machines in the real world ----------------------------------- Abstract machines, or virtual machines, are widely used for implementation of programming languages. Prime examples are Postscript (used in millions of printers and typesetters), P-code (widely used 15-20 years ago to implement Pascal on microcomputers), the Java Virtual Machine, and Microsoft Common Intermediate Language. Many projects exist whose goal is to develop new abstract machines, either to be more general, or for some specific purpose. The Java Virtual Machine (JVM) is a platform-independent abstract machine and a collection of standard (class) libraries developed by Sun Microsystems (since ca. 1994). Java programs are compiled to JVM bytecode to make Java programs portable across platforms. More recently, Microsoft has designed an abstract machine and infrastructure called Common Language Runtime (CLR), whose intermediate language is called Common Intermediate Language (CIL) or Microsoft Intermediate Language (MSIL), with very much the same goals as Sun's JVM. CIL and CLR are part of .NET (although it is a bit unclear what .NET really is]. The first release of CLR and CIL happened in January 2002, but beta-release documentation and software have been available since October 2000. Whereas JVM was planned as an intermediate target language only for Java, CIL is intended as a target language for a variety of high-level source languages: C, C++, COBOL, Haskell, SML, Visual Basic, and last but not least, C#, Microsoft's new Java-like language. In particular, programs written in any of these languages are supposed to be able to interoperate using a modernized COM, the Component Object Model. This has influenced the design of CIL, which is somewhat more general than that of JVM bytecode. While the JVM has been implemented on a large number of platforms (Solaris, Linux, MS Windows, web browsers) from the beginning, CLR and CIL are primarily intended for MS Windows NT/2000/XP and their successors. However, Microsoft has released a so-called shared source implementation of most of the CLR that works for FreeBSD Unix. Also, the Mono project (www.go-mono.com) is creating an implementation of CLR for many platforms, including Linux; as of April 2002 it is complete enough that the Mono C# compiler can compile itself when running under the Mono CLR implementation on Linux. The Parallel Virtual Machine (PVM) is a different kind of virtual machine: it is a library for C, C++ and Fortran programs that makes a network of computers look like a single (huge) computer. Program tasks can easily communicate with each other, even between different processor architectures (x86, Sun Sparc, PowerPC, ...) and different operating systems (Linux, MS Windows, Solaris, HP-UX, AIX, ...). This is used for distributed scientific computing. The Java Virtual Machine (JVM) runtime state -------------------------------------------- In general, a JVM runs one or more threads concurrently, but here we shall consider only a single thread of execution. Then the state of a JVM has the following components: * classes containing methods, methods containing bytecode * a heap * a frame stack * (class loaders, security managers that we don't care about here) The heap is used for storing values that are created dynamically and whose lifetimes are hard to predict. In particular, arrays and objects (including strings) are stored on the heap. The heap is managed by a garbage collector, which makes sure that unused values are thrown away so that the memory they occupy can be reused for new arrays and objects. The frame stack is a stack of frames, one frame for each method that has been invoked and which has not yet completed. For instance, when method main has called method fac on the argument 3, which has called itself recursively on the argument 2, the frame stack looks like this: +------------------+ | | | frame for fac(2) | | | +------------------+ +------------------+ | | | frame for fac(3) | | | +------------------+ +------------------+ | | | frame for main() | | | +------------------+ <-- bottom of the frame stack Every stack frame in the JVM has the following components: * local variables for this method * the local evaluation stack for this method * the program counter (pc) for this method Thus a single JVM stack frame looks like this: +------------------------+ | local variables | | local evaluation stack | | program counter | +------------------------+ The local variables include the method's parameters, and also the current object reference (this) if the method is non-static, and enclosing object references (C.this) if the method belongs to an inner class in class C. In the JVM bytecode, a local variable is named by its index; this is essentially the local variable's number. For instance, in a non-static method, the current object reference (this) has local variable index 0, the first method parameter has index 1, and so on. Unlike our previous simple abstract machine (file imp/Machine.java) the JVM keeps the evaluation stack separate from the local variables, and also keeps the frames of different method invocations separate from each other. All of the stack frames for a given method must have the same fixed size: the number of local variables and the maximal depth of the local evaluation stack must be determined in advance by the Java compiler. In addition to the local variables and the local evaluation stack, the instructions of a method can operate on * static fields of classes, given a class name and a field name * non-static fields of objects, given an object reference and a field name * the elements of arrays, given an array reference and an index Classes (with their static fields), objects (with their non-static fields), and arrays are stored in the heap. In the JVM the size of a value is one word (for booleans, bytes, characters, shorts, integers, floats, references to array or object), or two words (longs and doubles). This different value sizes affect the local variable indexes, and the choice of bytecode instructions for some stack manipulations. We shall ignore this problem in the following, and use only one-word values. The Java Virtual Machine (JVM) bytecode --------------------------------------- As can be seen, the JVM is a stack-based machine quite similar to the simple one (imp/Machine.java) we have used so far. There is a large number of JVM bytecode instructions. There are different instructions for different types of arguments, and the instruction name prefix indicates the type of value that the instruction handles: prefix type --------------------------------------------- i integer, short, char, byte b byte (in array instructions) c char (in array instructions s short (in array instructions f float d double a reference to array or object For instance, addition of integers is done by iadd, and addition of floats is done by by fadd. The main categories of instructions are: * push constant onto stack: bipush, sipush, iconst, ldc, aconst_null, ... * arithmetic: iadd, isub, imul, idiv, irem, ineg, iinc, fadd, fsub, ... * bitwise manipulation: iand, ior, ixor, ishl, ishr, ... * conversion between types: i2b, i2c, i2s, i2f, f2i, ... * load local variable onto stack: iload, aload, fload, ... * store local variable from stack: istore, astore, fstore, ... * load array element onto stack: iaload, baload, aaload, faload, ... * store array element from stack: iastore, bastore, aastore, fastore, ... * stack manipulation: swap, pop, dup, dup_x1, dup_x2, ... * allocate new array in the heap: newarray, anewarray, multianewarray, ... * load field onto stack: getfield, getstatic * store field from stack: putfield, putstatic * method call: invokevirtual, invokestatic, invokespecial, ... * method return: return, ireturn, areturn, freturn, ... * jumps: goto * conditional jumps (compare to 0): ifeq, ifne, iflt, ifle, ifgt, ifge * conditional jumps (compare two values): if_icmpeq, if_icmpne, ... * switch: lookupswitch, tableswitch * object-related: new, instanceof, checkcast * exceptions: athrow * monitors (Java's synchronized keyword): monitorenter, monitorexit * local subroutines (an abomination): jsr, ret The JVM bytecode instructions have symbolic names as indicated above, and they have fixed numeric codes that are used in JVM class files. A class file represents a Java class or interface, containing static and non-static field declarations, and static and non-static method declarations. A JVM reads one or more class files and executes the main(String[]) method in a designated class. The class files that are generated by a Java compiler or by our JVM-generating micro-C compiler, see below. The authoritative but informal description of the JVM and the bytecodes is Lindholm and Yellin: Java Virtual Machine Specification, Addison-Wesley 1999; also at http://java.sun.com/docs/books/vmspec/ Several attempts have been made at describing the JVM more formally and more precisely. One of them is Peter Bertelsen: Semantics of Java Bytecode, at http://www.dina.kvl.dk/~pmb/publications.html The contents of a Java Virtual Machine (JVM) class file ------------------------------------------------------- When a Java program is compiled using javac or jikes, one or more class files are produced. A class file C.class describes a single class or interface C. Nested classes within C are stored in separate class files named C$A, C$1, etc. The structure of a somewhat abstracted class file is described by the type class_decl in file jvm/Classdecl.sml in Peter Bertelsen's SML-JVM toolkit. The main components of a class file are: * the name and package of the class * the superclass, superinterfaces, and access flags (public etc) of the class * the field declarations of the class * the method declarations of the class * the constant pool containing field descriptions and method descriptions, string constants, large integer constants, etc; the SML-JVM toolkit builds the constant pool from the other components * the attributes (such as source file name) * possibly special methods named "" corresponding to the constructors of the class, and a special method names "" corresponding to the static initializer of the class For each field declaration (type field_decl), one must describe: * the name of the field * the type of the field * the access modifiers (static, public, final, ...) * the attributes (such as source file line number) For each method declaration (type method_decl), one must describe: * the name of the method * the signature of the method * the access modifiers (static, public, final, ...) * the attributes, including - the code for the method - the exceptions thrown by the method (the Java throws clause) The code for a method (attribute CODE) includes: * the maximal depth of the local stack in stack frame for the method * the number of local variables in the method * the bytecode itself, as a list of JVM instructions * the exception handlers, that is, try-catch blocks, of the method body; each handler (type exn_hdl) describes the bytecode range covered by the handler, that is, the try block, the entry of the handler, that is, the catch block, and the exception class handled by this handler * code attributes, such as source file line numbers (for runtime error reports) Using the SML-JVM toolkit, SML programs can build classes and export them to class files using the toolkit's function Classfile.emit. We shall use this in a compiler from micro-C to JVM bytecode; see below. A precompiled version of the SML-JVM toolkit (for Windows) can be downloaded from the course lecture plan. You can also compile the SML-JVM toolkit yourself from the sources at ftp://ftp.dina.kvl.dk/pub/Staff/Peter.Bertelsen/sml-jvm-toolkit.zip. If you have the make program installed, that is done simply by typing make all Otherwise you need to compile the files manually with mosmlc -c, in the right order. To study the contents of a class file C.class, and disassemble the JVM bytecode in it, execute javap -c C To display also the stack and locals sizes, execute javap -c -verbose C Java Virtual Machine (JVM) bytecode verification ------------------------------------------------ Before the JVM executes the bytecode, it will perform bytecode verification, a kind of type check. The overall goal is security: the bytecode program should not be allowed to crash the JVM or to perform illegal operations. This is especially important when executing `foreign' programs (e.g. applets) in a browser on the local machine. Bytecode verification checks the following things, and others, before the code is executed: * that all bytecode instructions work on stack operands and local variables of the right type * that a method uses no more local variables than it claims to * that a method uses no more local stack positions than it claims to * that a method throws no more exceptions than it claims to do * that for every point in the bytecode, the local stack has a fixed depth at that point (and thus the local stack does not grow without bounds) * that the execution of a method ends with a return or throw instruction (and does not `fall off the end of the bytecode') * that execution does not try to use one half of a two-word value (a long or double) as a one-word value (integer or reference or ...) * etc ... This verification procedure is patented. Plainly a silly thing to do, especially since the patented procedure (1) is a standard closure algorithm, (2) the published patent does not describe the really tricky point: verification of the so-called local subroutines. A compiler from micro-C to Java Virtual Machine (JVM) bytecode -------------------------------------------------------------- The file imp/jvmcomp.sml contains a compiler from a subset of micro-C to JVM bytecode. It is very similar in style to the backwards compiler in imp/contcomp.sml, but compiles only a subset of micro-C, with the following limitations: * there can be only one function: void main(...) * function main can take only integer arguments * there are no global variables * there are no pointers and no pointer arithmetics * the address operator (&) and pointer dereferencing (*) are not allowed The generated bytecode allocates local variables as JVM localvars, and allocates arrays on the Java heap, and therefore does not implement micro-C pointer manipulations. On the other hand, it would be fairly easy to allow the size of an array to be determined at runtime instead of compiletime. To generate code for more functions, one would need to generate several static methods in the class file, and call them using Jinvokestatic. This would not be very difficult. To permit global variables, one should allocate static fields in the class. This would not be too difficult either, but any initialization would have to happen in the pseudo-method (that is, the constructor). Provided you have compiled the SML-JVM toolkit, you can compile file ex11.c with the JVM-generating compiler, obtaining a class file Ex11.class, as follows: jvmcompile2file (parsef "ex11.c") "Ex11" The bytecode in the class file can be executed just by invoking the standard JVM on the file, passing (integer) arguments to the micro-C main method on the command line: java Ex11 8 This finds the 92 solutions to the eight queens problem. Direct execution of the JVM bytecode is approximately ten times faster than running java to interpret the simple Machine code generated by the continuation-based compiler contcompile2file in imp/contcomp.sml. The bytecode generated by jvmcompile2file relies on some auxiliary methods in class InOut, so InOut.java must be compiled to InOut.class (once) before you can execute the compiled programs. Example bytecode generated by the compiler ------------------------------------------ Below we investigate the bytecode generated from this micro-C program (imp/ex13.c): void main(int n) { int y; y = 1889; while (y < n) { y = y + 1; if (y % 4 == 0 && y % 100 != 0 || y % 400 == 0) print y; } } The bytecode, as output by javap -c -verbose, is { out(); /* Stack=3, Locals=1, Args_size=1 */ public static void main(java.lang.String[]); /* Stack=2, Locals=3, Args_size=1 */ } Method out() 0 aload_0 1 invokespecial #8 4 return Method void main(java.lang.String[]) 0 aload_0 1 iconst_0 2 aaload 3 invokestatic #17 6 istore_1 7 sipush 1889 10 istore_2 y = 1889; 11 goto_w 45 16 iload_2 17 iconst_1 18 iadd 19 istore_2 y = y + 1; 20 iload_2 21 iconst_4 22 irem 23 ifne 33 y % 4 == 0 26 iload_2 27 bipush 100 29 irem 30 ifne 41 y % 100 != 0 33 iload_2 34 sipush 400 37 irem 38 ifne 45 y % 400 == 0 41 iload_2 42 invokestatic #23 print y; 45 iload_2 46 iload_1 47 if_icmplt 16 y < n 50 bipush 10 52 invokestatic #27 55 return The header shows that there are two methods: the constructor and the main method. The constructor has one local variable and a maximal local stack depth of 3 (this is the declared maximal depth; the actual size is 1). The main method has 3 local variables and a maximal local stack depth of 2. Instructions 0-6 initialize n from args[0], and instructions 50-52 print an additional newline before terminating the program. The above code is amazingly similar to that generated by the javac compiler from the corresponding Java program. The only real difference is that javac (and jikes) generate a goto instruction in line 11, where we have generated a goto_w instruction, which requires 5 bytes instead of 3. To see for yourself, try: javac ex13.java javap -c ex13 Brief overview of the Common Intermediate Language (CIL) -------------------------------------------------------- Documentation of Microsoft's Common Intermediate Language (CIL) and the Common Language Runtime (CLR) can be found on the Microsoft Developer Network at http://msdn.microsoft.com/net/ -- except that I can never find the same thing twice there. The documentation is included also with the .NET Framework SDK which can be downloaded from the same place. The CLR CIL is a stack-based abstract machine very similar to the JVM, with a heap, a frame stack, the same concept of stack frame, bytecode verification, etc. A single CIL stack frame contains the same information as a JVM stack frame, and in addition can provide space for local allocation of structs and arrays: +------------------------+ | incoming arguments | | local variables | | local evaluation stack | | local allocation | | program counter | +------------------------+ CIL is intended as a target language for a range of different source languages, not just Java/C#, and therefore differs from the JVM in the following respects: * CIL has a more advanced type system than that of JVM, to better support source languages with more flexible type systems (it is unclear whether it could support parametric polymorphism well, though) * CIL's type system is also more complicated, as it includes several kinds of pointer, native-size integers (that are 32 or 64 bit wide depending on the platform), etc. * CIL has support for tail calls (but may choose to implement them like other calls), to better support functional source languages * CIL permits the execution of unverified code (an escape from the `managed execution'), pointer arithmetics etc, to support anarchic source languages such as C and C++ * CIL has a canonical textual representation (an assembly language), and there is an assembler ilasm and a disassemblers ildasm for this representation; the JVM has no official assembler format * CIL instructions are overloaded: there is only one add instruction, and load-time type inference determines whether it is an integer add, float add, double add, etc. When the argument type of an CIL instruction needs to be specified explicit, a suffix is used. For instance, ldc.i4 is an instruction for loading 4 byte integer constants: suffix type ---------------------------------------------------------------- i1 signed byte u1 unsigned byte i2 signed short (2 bytes) u2 unsigned short or character (2 bytes) i4 signed integer (4 bytes) u4 unsigned integer (4 bytes) i8 signed long (8 bytes) u8 unsigned long (8 bytes) r4 float (32 bit IEEE754 floating-point number) r8 double (64 bit IEEE754 floating-point number) i natural size signed integer u natural size unsigned integer, or unmanaged pointer r4result natural size result for 32-bit floating-point computation r8result natural size result for 64-bit floating-point computation o natural size object reference & natural size managed pointer s short form of instruction un unsigned form of instruction ---------------------------------------------------------------- The main CIL instruction kinds are: * push constant onto stack: ldc.i4, ldc.i8, ldnull, ldstr, ldtoken * arithmetic: add, sub, mul, div, rem, neg * arithmetic with overflow check: add.ovf, add.ovf.un, sub.ovf, ... * bitwise manipulation: and, not, or, xor, shl, shr, shr.un * compare values: ceq, cgt, cgt.un, clt, clt.un * conversion between types: conv.i1, conv.i2, ..., conv.r4. conv.r8, ... * load local variable onto stack: ldloc * load argument onto stack: ldarg * load indirect, given address: ldind.i1, ldind.i2, ..., ldind.r4, ... * store indirect, given address: stind.i1, stind.i2, ..., stind.r4, ... * store local variable from stack: stloc * store argument from stack: starg * load array element onto stack: ldelem.i1, ldelem.i2, ..., ldelem.r4, ... * store array element from stack: stelem.i1, stelem.i2, ..., stelem.r4, ... * stack manipulation: pop, dup * allocate new array in the heap: newarr * load field onto stack: ldfld, ldstfld * store field from stack: stfld, stsfld * load address (for call-by-reference): ldloca, ldarga, ldelema, ldflda, ldsflda * method call: call, calli, callvirt * method return: ret * load method pointer: ldftn, ldvirtftn * unconditional jump: br * conditional jumps (compare to 0): brfalse, brtrue * conditional jumps (compare two values): beq, bge, bge.un, bgt, bgt.un, ble, ble.un, blt, blt.un, bne.un * conditional jumps (compare two values): beq, bge, bge.un, bgt, bgt.un, ble, ble.un, blt, blt.un, bne.un * switch: switch * object-related: newobj, isinst, castclass * exceptions: throw, rethrow * try-catch-finally: endfilter, endfinally, leave, * manipulating value types: box, unbox, cpobj, initobj, ldobj, stobj, sizeof Unverifiable (unmanaged) CIL instructions: * jump to method (a tail call): jmp, jmpi * block memory operations: cpblk, initblk, localloc The CIL machine does not have the JVM's infamous local subroutines. Instead so-called protected blocks (those covered by catch clauses or finally clauses) are subject to certain restrictions. One cannot jump out of or return from a protected block; instead a special instruction `leave' must be executed, causing the any finally blocks to be executed. A program in C#, Visual Basic .NET, SML.NET etc, such as the C# version imp/ex13.cs of the imp/ex13.c example, is compiled to a CLR file ex13.exe, which is not really a classic MS Windows .exe file. Such an .exe file can be disassembled to symbolic CIL code using ildasm /text ex13.exe This reveals the following CIL code, which is structurally idential to the JVM code generated by javac for imp/ex13.java: .method private hidebysig static void Main(string[] args) cil managed { .entrypoint // Code size 72 (0x48) .maxstack 2 .locals init (int32 V_0, int32 V_1) IL_0000: ldarg.0 IL_0001: ldc.i4.0 IL_0002: ldelem.ref IL_0003: call int32 [mscorlib]System.Int32::Parse(string) IL_0008: stloc.0 IL_0009: ldc.i4 0x761 y = 1889; IL_000e: stloc.1 IL_000f: br.s IL_003e IL_0011: ldloc.1 IL_0012: ldc.i4.1 IL_0013: add IL_0014: stloc.1 y = y + 1; IL_0015: ldloc.1 IL_0016: ldc.i4.4 IL_0017: rem IL_0018: brtrue.s IL_0020 y % 4 == 0 IL_001a: ldloc.1 IL_001b: ldc.i4.s 100 IL_001d: rem IL_001e: brtrue.s IL_0029 y % 100 != 0 IL_0020: ldloc.1 IL_0021: ldc.i4 0x190 IL_0026: rem IL_0027: brtrue.s IL_003e y % 400 == 0 IL_0029: ldloc.1 print y; IL_002a: box [mscorlib]System.Int32 IL_002f: ldstr " " IL_0034: call string [mscorlib]System.String::Concat(object, object) IL_0039: call void [mscorlib]System.Console::Write(string) IL_003e: ldloc.1 IL_003f: ldloc.0 IL_0040: blt.s IL_0011 y < n IL_0042: call void [mscorlib]System.Console::WriteLine() IL_0047: ret } // end of method ex13::Main The heap and garbage collection ------------------------------- Heap-allocation and garbage collection are not specific to abstract machines, but has finally become accepted in the mainstream thanks to the Java Virtual Machine. In the machine models for micro-C studied so far, the main storage data structure was the stack. The stack was used for storing activation records (stack frames) containing variables' values, and for storing intermediate results. An important property of the stack is that if value f1 is pushed on the stack before value f2, then f2 is popped off the stack before f1 --- last in, first out. This makes allocation and deallocation efficient. The stack property follows from the design of micro-C: * it has static (or lexical) scope rules: the binding of a variable occurrence x can be determined from the program text only, without taking into account the program execution; * it has nested scopes: blocks { ... } within blocks; * it does not allow functions to be returned from functions, so there is no need for closures; * it does not have dynamic data structures such as tree or lists. Thanks to these restrictions, the lifetime of a value can be easily determined when the value is created. In fact, the value can live no longer than any values created before it --- this permit the stack-like allocation. Many modern programming languages do permit the creation of values whose lifetime cannot be determined at their point of creation. In particular, they have functions as values, and hence need closures (Scheme, ML), they have dynamic data structures such as lists and trees (Scheme, ML, Haskell), they have thunks or suspensions (representing lazily evaluated values, in Haskell), and they have objects (Simula, Java, C#). Values with unpredictable lifetime are stored in another storage data structure, the so-called heap. (Here `heap' means something like `disorderly collection of data'; it has nothing to do with priority queue, as in algorithmics.) Data are explicitly allocated in the heap by the program, but cannot be explictly deallocated: deallocation is done automatically by a so-called garbage collector. A heap with automatic garbage collection is used in Lisp (1960), Simula (1967), Scheme (1975), ML (1978), Smalltalk (1980?), Haskell (1990), Java (1994), C# (1999), and most scripting languages. A major advantage of Java over previous mainstream languages is the use of garbage collection. One of the great drawbacks of Pascal, C, and C++ is the absence of automatic garbage collection. Data whose lifetime is unpredictable can be allocated outside the stack using malloc (in C) or new (in C++): char *strbuf = (char*)malloc(len+1); char *strbuf = new char[len+1]; but such data must be explicitly deallocated by the program using free (in C) or delete (in C++): free(strbuf); delete strbuf; One would think that the programmer knows best when to deallocate his data, but in practice, this often goes horribly wrong. Either data are deallocated too early and the program crashes, or too late, and the program uses more and more space while running and must be restarted every so often: it has a memory leak. To permit local deallocation (and as a defence against unintended updates), C++ programmers often copy (clone) their objects before storing or passing them to other functions, causing the program to run much slower than strictly necessary. Also, because it is so cumbersome to allocate data dynamically in C and C++, there is a tendency to use statically allocated fixed-size buffers, which are prone to buffer overflows (and many server vulnerabilities) and which prevent library functions from being thread-safe. Allocation in the heap ---------------------- In Java and C#, every new array or object (including strings) is allocated in the heap: Cow c1 = new Cow(); int[] ps = new int[100000]; The variables c1 and ps hold references to the object and array stored in the heap. In Standard ML, closures (fn x => y * x) and constructed data such as pairs (3, true), lists [2, 3, 5, 7, 11], strings, arrays, etc will most likely be allocated in the heap, although SML implementations have more freedom to choose between the stack and the heap than Java or C# implementations. Automatic deallocation by garbage collection -------------------------------------------- The purpose of the garbage collector is to make room for new data in the heap by reclaiming space occupied by old data that is no longer used. There are many different garbage collection algorithms to choose from. It is customary to distinguish between the collector (which reclaims unused space) and the mutator (which allocates new values and possibly updates old values). The collector exists for the sake of the mutator, which does the real useful work. All garbage collection algorithms have a notion of root set. This is typically the variables of all the active (not yet returned-from) function calls or method calls of the program. Thus the root set is those references to the heap found in the current stack frames and in machine registers (if any). Automatic memory management, garbage collection, reference counting gc, mark-sweep gc, two-space gc, generational gc, incremental gc. Mark-sweep collection, and the freelist --------------------------------------- With mark-sweep garbage collection, the heap contains allocated objects of different sizes, and unused blocks of different sizes. Every allocated block contains a header with a size field and other information about the block, and possibly a description of the rest of the block's contents. All the unused blocks are linked together in a so-called freelist: each unused block has a header with a size field and a pointer to the next unused block. A pointer to the first block on the freelist is kept in a special freelist register by the garbage collector. Allocation of a new value (object, closure, string, ...) is done by traversing the freelist until a large enough block is found. If no such block is found, garbage collection may be initiated. If there is still no large enough block, the heap must be extended by requesting more memory from the operating system. Garbage collection is done in two phases: (1) The mark phase: Mark all blocks that are reachable from the root set. This can be done by first marking all those blocks pointed to from the root, and recursively mark all unmarked blocks pointed to from marked blocks. This works even when there are pointer cycles in the heap. The recursive step can use a stack, but can also be done without it. After this phase all live blocks are marked. (2) The sweep phase: Go through all blocks in the heap, unmark the marked blocks and put the unmarked blocks on the freelist, joining adjacent free blocks into a single larger block. Advantages: Fairly simple to implement. Once allocated, a value is never moved; this is important if a pointer to the value has been given to an external procedure, such as an operating system procedure. Disadvantages: When allocating a block, searching the freelist for a large enough free block may take a long time (if there are many small blacks in the beginning of the list). Also, the heap may become fragmented. For instance, we may be unable to allocate a block of 36 bytes although there are thousands of unused (but non-adjacent) 32-byte blocks. Finally, a complete cycle of marking and sweeping may take a long time, causing a long break in the execution of the program. Variants: Mark-sweep garbage collection can be made incremental, so that the mark phase consists of many slices, separated by execution of the mutator, and similarly for the sweep phase. This requires extra data in each heap block. Mark-sweep collection was invented for Lisp by John McCarthy in 1960. Two-space stop-and-copy garbage collection ------------------------------------------ With two-space stop-and-copy collection, the heap is divided into two equally large semispaces. At any time, one semispace is called the from-space and the other is called the to-space. At each garbage collection, the two semispaces swap roles. There is no freelist. Instead an allocation pointer points into the from-space; all memory from the allocation pointer to the end of the from-space is unused. Allocation is done in the from-space, at the point indicated by the allocation pointer. The allocation pointer is simply incremented by the size of the block to be allocated. If there is not enough space available, a garbage collection must be made. Garbage collection (1) moves all live values from the from-space to the to-space (initially empty). Then (2) it sets the allocation pointer to point to the first available memory cell of the to-space, ignores whatever is in the from-space, and (3) swaps from-space and to-space: Notation: + = live, x = dead, - = available for allocation. (1) At the beginning of a garbage collection: from-space to-space |+++xx+xxxx+++xxx++++xxxxx|-------------------------| (2) After moving live values from from-space to to-space: from-space to-space |+++xx+xxxx+++xxx++++xxxxx|+++++++++++--------------| (3) At the end of the garbage collection, after swapping: to-space from-space | |+++++++++++--------------| At the end of a garbage collection, the (new) from-space contains all live values and has room for new allocations, and the (new) to-space is empty and remains empty until the next garbage collection. During the garbage collection, values are copied from the from-space to the to-space as follows: Initially every from-space value reachable from the root set is moved into the to-space (allocating from one end of the initially empty to-space); the root set pointer to the value must be updated to point to the new location. Whenever a value is moved, a forwarding pointer is stored in the old (from-space) copy of the value. Now the values of the to-space are inspected for pointers to values. If a pointer points to a value in from-space, then that value is inspected. If the value contains a forwarding pointer, then the pointer is updated to refer to the (new) to-space address. If the does not contain a forwarding pointer, then it is moved to the to-space, and a forwarding pointer is stored in the old (from-space) copy of the value. Advantages: Good locality. No fragmentation. No stack is needed for garbage collection, only a few pointers. Disadvantages: At most half of the available memory space can contain live data. If the heap is nearly full, then every garbage collection will copy almost all live data, but may reclaim only very little unused memory. Thus as the heap gets full, performance get considerably worse. A data value may be moved at any time after its allocation, so a pointer to a value cannot be passed to external procedures. Two-space copying garbage collection was described by C.J. Cheney: A Nonrecursive List Compacting Algorithm, Communications of the ACM 13, 11 (1970) 677-678. Generational garbage collectors ------------------------------- Generational garbage collection starts from the observation that most allocated values die young. Therefore it is wasteful to copy all the live, mostly old, values in every garbage collection cycle, only to reclaim the space occupied by some young, now dead, values. Instead, divide the heap into several generations, numbered 0, 1, ..., N. Always allocate in generation 0. When generation 0 is full, do a minor garbage collection: promote (move) all live values from generation 0 to generation 1. Then generation 0 is empty and new objects can be allocated into it. When generation 1 is full, promote live values to generation 2, etc. Generation N, the last generation, may be collected by a mark-sweep algorithm. Advantages: Reclaims short-lived values very efficiently. Does not move data (which is important if pointers need to be passed to external procedures). Disadvantages: More complex. Generational garbage collection was proposed by Lieberman and Hewitt at MIT; see Communications of the ACM, June 1983. Moscow ML uses a two-generation incremental garbage collector implemented by Doligez at INRIA, France. It has a small generation 1 (the young generation) in which most allocation takes place. Generation 1 is garbage-collected by moving live data to generation 2 (a minor collection). Generation 2 (the old generation) is garbage-collected by incremental mark-sweep, one slice of a major collection for each collection of generation 1. This gives fast collection of short-lived values, and short garbage collection pauses. A description can be found in http://para.inria.fr/~doligez/caml-guts/Sestoft94.txt Sun JDK Hotspot 1.3.1 (and 1.4, presumably) Java virtual machine uses a three-generation collector. By default, a minor collection copies from generation 1 to generation 2. Generation 2 uses stop-and-copy garbage collection, and promotes values to generation 3 when they are old enough. Generation 3 uses non-incremental mark-sweep with compaction. Moreover, the Sun JDK Hotspot 1.3 supports copying, mark-compact, and incremental mark-compact. For instance, generation 3 collections can be made incremental by passing the option -Xincgc to the java virtual machine. See http://java.sun.com/docs/hotspot/gc/ for the gory details. Further topics -------------- Threaded code, implementation of Forth. Just-in-time compilation. Compilation to register machine code (the Perl abstract machine, Parrot), register allocation (see Torben Mogensen's manuscript). Literature ---------- * Diehl, Hartel and Sestoft: Abstract machines for programming language implementation, Future Generation Computer Systems 16, 7 (May 2000) 739-751. * Lindholm and Yellin's Java Virtual Machine specification, Addison-Wesley 1999, and at http://java.sun.com/docs/books/vmspec/ * Peter Bertelsen: Semantics of Java Bytecode, at http://www.dina.kvl.dk/~pmb/publications.html * Peter Bertelsen's SML-JVM Toolkit, and the introduction smljvm-toolkit.pdf * Microsoft's CLR and CIL specifications and implementations are available at http://msdn.microsoft.com/net/ * The Parallel Virtual Machine (PVM) project http://www.csm.ornl.gov/pvm/ * A Virtual Virtual Machine project: http://www-sor.inria.fr/projects/vvm/ * Richard Jones and Rafael Lins: Garbage Collection: Algorithms for Automatic Dynamic Memory Management, John Wiley & Sons, 1996. * Paul Wilson: Uniprocessor Garbage Collection Techniques, ACM Computing Surveys 1995?. Draft available as ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps