Note 9, Programming Language Concepts (sestoft@dina.kvl.dk) 2002-04-04 ---------------------------------------------------------------------- Generating code backwards, and optimizing it on the fly ------------------------------------------------------- In lecture 7 we compiled micro-C programs to abstract machine code for a stack machine, but the code quality was poor, with many jumps to jumps, addition of zero, tests of constants, etc. Here we present a simple optimizing compiler which optimizes the code on the fly, while generating it. The compiler does not rely on advanced program analysis. Instead it combines local optimizations (so-called peephole optimizations) with backwards code generation. In backwards code generation, one uses a `compile-time continuation' to represent the instructions following the code currently being generated. The compile-time continuation is simply a list of the instructions following the instruction currently being generated. At run-time, those instructions represent the continuation of the instructions currently being: they will consume any result it produces (on the stack). Using this approach, a one-pass compiler * can optimize the compilation of logical connectives (such as !, && and ||) into efficient control flow code; * can generate code for a logical expression e1 && e2 which is adapted to its context of use: + will the logical expression's value be bound to a variable: b = e1 && e2 + or will be used as the condition in an if- or while-statement: if (e1 && e2) ...; * can avoid generating jumps to jumps in most cases; * can eliminate (some) dead code, that is, instructions that cannot be executed; * can recognize tail calls and compile them as jumps (instruction TCALL) instead proper function calls (instruction CALL), so that a tail-recursive function will execute in constant space. Such optimizations might be called backwards optimizations: they exploit information about the `future' of an expression: the use of its value. Forwards optimizations, on the other hand, would exploit information about the `past' of an expression: its value. A forwards optimization may for instance exploit that a variable has a particular constant value, and use that value to simplify expressions in which the variable is used (constant propagation). This is possible only to a very limited extent in backwards code generation. Generating code backwards: changes to the compilation functions --------------------------------------------------------------- In the old forwards compiler, the compilation function cStmt for statements had the type cExpr : expr -> venv -> instr list In the backwards compiler, the compilation function for statements instead has the type cExpr : expr -> venv -> instr list -> instr list The only change is that an additional argument of type instr list (that is, list of instructions) has been added; this is the compile-time continuation C. All other compilation functions (cStmt, cAccess, cExprs, etc) are modified similarly. To see how the code continuation is used, consider the compilation of simple expressions such as constants (CstI i) and unary primitives (Prim("!", e1)). In the old forwards compiler, code fragments (instruction lists) are generated and concatenated together: fun cExpr (e : expr) (env : venv) : instr list = case e of ... | Cst (CstI i) => [CSTI i] | Prim1(ope, e1) => cExpr e1 env @ (case ope of "!" => [NOT] | "printi" => [PRINTI] | "printc" => [PRINTC] | _ => raise Fail "unknown primitive 1") ... For instance, the expression !false, which is Prim("!", CstI 0) in abstract syntax, is compiled to [CSTI 0, NOT]. In a backwards (continuation-based) compiler, the corresponding compiler fragment would look like this: fun cExpr (e : expr) (env : venv) (C : instr list) : instr list = case e of ... | Cst (CstI i) => CSTI i :: C | Prim1(ope, e1) => cExpr e1 env (case ope of "!" => NOT :: C | "printi" => PRINTI :: C | "printc" => PRINTC :: C | _ => raise Fail "unknown primitive 1") ... So the new instructions generated are simply stuck onto the front of the code C already generated. This in itself achieves nothing (except that it avoids using the append function @ on the generated instruction lists which can be costly). So far the code generated for !false is still [CSTI 0, NOT]. Optimizing the code while generating it --------------------------------------- Now that the code continuation C is available, we can optimize the generated code. For instance, when the first instruction in C (which is the next instruction to be executed at run-time) is NOT, then there is no point in generating the instruction CSTI 0. Instead we should generate the constant CSTI 1, and throw away the NOT instruction. We can easily modify the expression compiler cExpr to recognize such special situations, and generate optimized code: fun cExpr (e : expr) (env : venv) (C : instr list) : instr list = case e of ... | Cst (CstI i) => (case (i, C) of (0, NOT :: C1) => CSTI 1 :: C1 | (_, NOT :: C1) => CSTI 0 :: C1 | _ => CSTI i :: C) ... With this scheme, the code generated for !false will be [CSTI 1]. In practice, we introduce an auxiliary function addCST to take care of these optimizations, both to avoid cluttering up the main functions, and because constants (CSTI) are generated in several places in the compiler: fun cExpr (e : expr) (env : venv) (C : instr list) : instr list = case e of ... | Cst (CstI i) => addCST i C ... The addCST function is defined by straightforward pattern matching: fun addCST i C = case (i, C) of (0, EQ :: C1) => addNOT C1 | (0, ADD :: C1) => C1 | (0, SUB :: C1) => C1 | (0, NOT :: C1) => addCST 1 C1 | (_, NOT :: C1) => addCST 0 C1 | (1, MUL :: C1) => C1 | (1, DIV :: C1) => C1 | (_, INCSP m :: C1) => if m < 0 then addINCSP (m+1) C1 else CSTI i :: C | (0, IFZERO lab :: C1) => addGOTO lab C1 | (_, IFZERO lab :: C1) => C1 | (0, IFNZRO lab :: C1) => C1 | (_, IFNZRO lab :: C1) => addGOTO lab C1 | _ => CSTI i :: C Note in particular that instead of generating [CSTI 0, IFZERO lab] this will generate an unconditional jump [GOTO lab]. This optimization turns out to be very useful in conjunction with others. The auxiliary functions addNOT, addINCSP, and addGOTO generate NOT, INCSP, and GOTO instructions, optimizing the code if possible. An attractive property of these local optimizations is that one can easily see that they are correct. Their correctness depends only on some simple code equivalences for the abstract stack machine, which are quite easily proven by considering the state transitions of the abstract machine (see notes07.txt). Concretely, the function addCST above embodies these instruction sequence equivalences: 0, EQ === NOT 0, ADD === 0, SUB === 0, NOT === 1 n, NOT === 0 when n<>0 1, MUL === 1, DIV === n, INCSP m === INCSP (m-1) 0, IFZERO a === GOTO a n, IFZERO a === when n<>0 0, IFNZRO a === n, IFNZRO a === GOTO a when n<>0 Additional equivalences are used in other optimizing code-generating functions (addNOT, makeINCSP, addINCSP, addGOTO): NOT, NOT === NOT, IFZERO a === IFNZRO a NOT, IFNZRO a === IFZERO a INCSP 0 === INCSP m1, INCSP m2 === INCSP (m1+m2) INCSP m1, RET m2 === RET (m2-m1) Using the code continuation to optimize the compilation of jumps ---------------------------------------------------------------- To see how the code continuation is used when optimizing jumps (instructions GOTO, IFZERO, IFNZRO), consider the compilation of a conditional statement: if (e) stmt1 else stmt2 The old forwards compiler (imp/comp.sml) used this compilation scheme: let val labelse = newLabel() val labend = newLabel() val code1 = cStmt stmt1 env val code2 = cStmt stmt2 env in cExpr e env @ [IFZERO labelse] @ code1 @ [GOTO labend] @ [Label labelse] @ code2 @ [Label labend] end The above compiler fragment generates various code pieces (instruction lists) and concatenates them to form code such as this: [[e]] IFZERO L1 [[stmt1]] GOTO L2 L1: [[stmt2]] L2: where [[e]] denotes the code generated for expression e, and similarly for the statements. A plain backwards compiler generates exactly the same code, but does it backwards, by sticking new instructions in front of the instruction list C (the compile-time continuation): let val labelse = newLabel() val labend = newLabel() in cExpr e env (IFZERO labelse :: cStmt stmt1 env (GOTO labend :: Label labelse :: cStmt stmt2 env (Label labend :: C))) end Optimizing the code for jumps while generating it ------------------------------------------------- The continuation-based compiler fragment above unconditionally generates new labels and jumps. But if the instruction after the if-statement is GOTO L3, then it would wastefully generate [[e]] IFZERO L1 [[stmt1]] GOTO L2 L1: [[stmt2]] L2: GOTO L3 One should much rather generate GOTO L3 than a GOTO L2 which leads directly to a new jump (*). Thus instead of mindlessly generating a new label (labend) and a GOTO, we call an auxiliary function makeJump that checks whether the first instruction of C is a GOTO (or a return RET or a label) and generates a suitable jump instruction jumpend, adding a label to C if necessary, giving C1: val (jumpend, C1) = makeJump C The makeJump instruction is easily written using pattern matching. If C begins with a return RET (possibly below a label), then jumpend is RET; if C begins with label lab or GOTO lab, then jumpend is GOTO lab; otherwise, we invent a new label lab and then jumpend is GOTO lab: fun makeJump C : instr * instr list = case C of Label lab :: RET m :: _ => (RET m, C) | RET m :: _ => (RET m, C) | Label lab :: _ => (GOTO lab, C) | GOTO lab :: _ => (GOTO lab, C) | _ => let val lab = newLabel() in (GOTO lab, Label lab :: C) end Similarly, we need to stick a label in front of [[stmt2]] only if there is no label (or GOTO) already, so we use a function addLabel to return a label labelse, possibly sticking it in front of [[stmt2]]: val (labelse, C2) = addLabel (cStmt stmt2 env C1) Note that C1 (that is, C possibly preceded by a label) is the code continuation of stmt2. The function addLabel uses pattern matching on C to decide whether a label needs to be added. If C begins with a GOTO lab or label lab, we can just reuse lab; otherwise we must invent a new label: fun addLabel C : label * instr list = case C of Label lab :: _ => (lab, C) | GOTO lab :: _ => (lab, C) | _ => let val lab = newLabel() in (lab, Label lab :: C) end Finally, when compiling an if-statement with no else-branch: if (e) stmt we do not want to get code like this, with a jump to the next instruction: [[e]] IFZERO L1 [[stmt1]] GOTO L2 L1: L2: To avoid this, we introduce a function addJump which recognizes this situation and avoids generating the GOTO. Putting everything together, we have this optimizing compilation scheme for an if-statement If(e, stmt1, stmt2): let val (jumpend, C1) = makeJump C val (labelse, C2) = addLabel (cStmt stmt2 env C1) in cExpr e env (IFZERO labelse :: cStmt stmt1 env (addJump jumpend C2)) end This gives a flavour of the optimizations performed for if-statements. Below we show how additional optimizations for constants improve the compilation of logical expressions. (*) Jumps slow down pipelined processors considerably because they cause instruction pipeline stalls. So-called branch prediction logic in modern processors mitigates this effect to some degree, but still it is better to avoid excess jumps. Optimizing the compilation of composite logical expressions ----------------------------------------------------------- As in the old forwards compiler (file imp/comp.sml) logical non-strict connectives such as !, && and || are compiled to conditional jumps, not to special instructions that manipulate boolean values. Consider the example program in file imp/ex13.c. It prints the leap years between 1890 and the year n entered on the command line: void main(int n) { int y; y = 1889; while (y < n) { y = y + 1; if (y % 4 == 0 && y % 100 != 0 || y % 400 == 0) print y; } } The non-optimizing forwards compiler generates this code for the while loop: GOTO L2 L1: GETBP, 1, ADD, GETBP, 1, ADD, LDI, 1, ADD, STI, INCSP ~1, y=y+1 GETBP, 1, ADD, LDI, 4, MOD, 0, EQ, IFZERO L8, y%4==0 GETBP, 1, ADD, LDI, 100, MOD, 0, EQ, NOT, GOTO L7, y%100!=0 L8: 0, L7: IFNZRO L6, GETBP, 1, ADD, LDI, 400, MOD, 0, EQ, GOTO L5, y%400==0 L6: 1, L5: IFZERO L3, GETBP, 1, ADD, LDI, PRINTI, INCSP ~1, GOTO L4, print y L3: INCSP 0, L4: INCSP 0, L2: GETBP, 1, ADD, LDI, GETBP, 0, ADD, LDI, LT, IFNZRO L1 y v1:..:vm:r2:u1:..:un:b:r1:s by CALL m f (at r2-1) ==> v:w1:..:wk:r2:u1:..:un:b:r1:s by code at f ==> v:u1:..:un:b:r1:s by RET k ==> v:s by RET n (at r2) v1:..:vm:u1:..:un:b:r1:s ==> v1:..:vm:b:r1:s by TCALL m n f (at r2-1) ==> v:w1:..:wk:b:r1:s by code at f ==> v:s by RET k The new continuation-based compiler uses an auxiliary function makeCall to recognize tail calls: fun makeCall m lab C : instr list = case C of RET n :: C1 => TCALL(m, n, lab) :: C1 | Label _ :: RET n :: _ => TCALL(m, n, lab) :: C | _ => CALL(m, lab) :: C It will compile the above example function main to the following abstract machine code, in which the recursive call to main has been recognized as a tail call and has been compiled as a TCALL: L0: GETBP, LDI, IFZERO L1, if (n) GETBP, LDI, 1, SUB, TCALL(1, 1, L0), main(n-1) L1: 17, RET 1 17 Note that the compiler will recognize a tail call only if it is immediately followed by a RET. Thus a tail call inside an if-statement (file imp/ex15.c): void main(int n) { if (n!=0) { print n; main(n-1); } else print 999999; } is optimized to use the TCALL instruction only if the compiler never generates a GOTO to a RET, by directly generates a RET. Thus the makeJump optimizations made by our continuation-based compiler are important also for efficient implementation of tail calls. Remaining deficiencies of the generated code -------------------------------------------- There are still some problems with the generated code for . For instance, compilation of this statement (file imp/ex16.c): if (n) { } else print 1111; print 2222; generates this machine code: GETBP, LDI, IFZERO L2, GOTO L1 L2: 1111, PRINTI, INCSP ~1, L1: CSTI 2222, PRINTI which could be optimized by inverting IFZERO L2 to IFNZRO L1 and deleting the GOTO L1. Similarly, the code generated for certain trivial while loops is unsatisfactory. We would like the code generated for void main(int n) { print 1111; while (false) { print 2222; } print 3333; } to consist only of the print 1111 and print 3333 statements, leaving out the unexecutable while loop completely. Currently, this is not ensured by the compiler. This is not a serious problem: some unreachable code is generated, but it does not slow down the program execution. Other optimizations ------------------- There are many other kinds of optimizations that an optimizing compiler might perform, but that are not performed by our simple compiler: * Constant propagation: if a variable x is set to a constant value, such as 17, and never modified, then every use of x can be replaced by the use of the constant 17. This may enable further optimizations if the variable is used in expressions such as x * 3 + 1 etc. * Common subexpression elimination: if the same (complex) expression is evaluated twice with the same values of all variables, then one could instead evaluate it once, store the result (in a variable or on the stack top), and reuse it. Common subexpressions frequently occur behind the scenes. For instance, the assignment a[i] = a[i] + 1; is compiled to GETBP, aoffset, ADD, LDI, GETBP, ioffset, ADD, ADD GETBP, aoffset, ADD, LDI, GETBP, ioffset, ADD, ADD, LDI, CST 1, ADD, STI where the address (lvalue) of the array indexing a[i] is evaluated twice. It might be better to compute it once, and store it in the stack. However, the current stack machine instruction set does not make it easy to reuse the computed address. Adding a `duplicate-below' instruction to the machine, similar to the JVM's dup_x1 instruction, might help. * Loop invariant computations. If an expression inside a loop (for, while) does not depend on any variables modified by execution of the loop body, then the expression may be computed outside the loop (unless evaluation of the expression has a side effect, in which case it must be evaluated inside the loop). For instance, in while (...) { a[i] = ... } part of the array indexing a[i] is loop invariant, namely the computation of the array base address: GETBP, aoffset, ADD, LDI so this could be computed once and for all before the loop. * Dead variable/expression elimination: if the value of a variable or expression is never used, then the variable or expression may be removed (unless evaluation of the expression has side effects, in which case the expression must be preserved). * ... Useful materials ---------------- * Xavier Leroy: The Zinc experiment: an economical implementation of the ML language, Report 117, INRIA Rocquencourt, 1990, describes optimizing backwards code generation for an abstract machine. This is essentially the machine and code generation technique used in Caml Light, OCaml, and Moscow ML. The idea is probably much older, though. * Mads Tofte's 1990 Nsukka notes on compiling PL/0 presents the idea of backwards compilation, but the representation of the code continuation is more complicated and gives fewer opportunities for optimization. * Abelson and Sussman: Structure and interpretation of computer programs, MIT Press 1986, hint at the possibility of optimization on the fly in a continuation-based compiler.