By Martin Elsman and Niels Hallenberg
In this note, we demonstrate the process of generating an executable MLKit that uses regions itself for compiling Standard ML programs. An overview of the bootstrapping process is pictured in Figure 1.
Figure 1. The bootstrapping process illustrated using T-diagrams. Each compiler in the diagram is characterized by three languages: the source language that it compiles, the target language that it generates code for, and the implementation language that it is written in.
The lowest block in the picture denotes the SML/NJ compiler, which is used to compile the MLKit sources to a version of the MLKit that, when running, uses SML/NJ's runtime system. This version of the MLKit is pictured to the right of the SML/NJ executable and is named kit1.
To build kit1, enter the directory kit (i.e., the directory containing the file copyright). The kit directory contains a Makefile that automates the bootstrapping process. Assuming, that the MLKit is installed in the directory $HOME/mlkit, from within this directory, type the following commands to generate the kit1 compiler:
$ cd kit $ ./autobuild $ ./configure $ make clean ... $ make nj ...
The make command produces a file bin/kit.x86-linux, which is an SML/NJ image that can be executed using the shell script bin/mlkit. The image is compiled using the Basis Library of SML/NJ and executes using the runtime system of SML/NJ.
Another possible point of departure is to use MLton (version 20051202 or later) to generate kit1; this, however, requires a machine with a minimum of 2-3GB of memory. For compiling kit1 using MLton, execute
$ make mlkit
instead of the step make nj above. Yet another possibility is to use an already working MLKit compiler, in which case, you need to modify the file kit/Makefiledefault to suit your needs.
Once kit1 is compiled, using the steps presented above, write
$ make bootstrap
from within the kit directory. This command will generate both kit2 and kit3 within the directory kit/bootstrap and test whether the resulting executables are identical after stripping symbol table information.
Notice that the kit2 compiler uses the runtime system of the MLKit, which combines region inference and garbage collection. However, the kit2 compiler is still affected by SML/NJ. For instance, many constants, including bit-vectors for garbage collection, are calculated using the Basis Library of SML/NJ. Thus, the kit2 executable may work even if the MLKit implementation of the Basis Library is buggy.
Testing the Bootstrapped Compiler
It is possible to apply all the MLKit tests in the test directory to the bootstrapped versions of the MLKit. To do this, enter the directory bootstrap/mlkit-v3/test and perform the test as follows:
$ cd bootstrap/mlkit-v3/test $ make test_mlkit
As a result, a test report is generated.
Profiling the MLKit
The MLKit can compile itself only when the compiler is capable of using garbage collection (in combination with region inference) when compiling programs.
To construct a region profile of the MLKit in action, we first generate a version of the MLKit with region profiling enabled; we call it kit3P and we build it using kit3:
$ cd $HOME/mlkit-v3 $ rm -rf $HOME/mlkit-v3P $ make bootstrap COMP_FLAGS=-prof INSTDIR=$HOME/mlkit-v3P
To generate a region profile of the MLKit in action, we use kit3P to compile a smaller program kitkbjul9.sml found in the test directory. We enter the test directory and compile kitkbjul9.sml using the mlkit program (which have region profiling enabled), located in the bin directory:
$ cd $HOME/mlkit-v3P/test $ ../bin/mlkit kitkbjul9.sml ... [wrote executable file: run] $ rp2ps -region -name "MLKit compiling kitkbjul9" -sampleMax 2000 -eps 137 mm Region profiling to output file region.ps. Using name MLKit compiling kitkbjul9. Using 2000 samples. Using encapsulated postscript with width 388 pt. $ gv -seascape region.ps
The following figure shows the region profile obtained with the commands shown above.
Figure 2. A region profile of the MLKit compiling the program kitkbjul9.sml, which is located in the test directory.
Controlling Profiling Options
To control the profiling options at runtime as described in the Section on Region Profiling, we use the program mlkit.img in the bin directory and supply all command line parameters necessary, which includes an install directory (i.e., the directory specified when building this particular version of the MLKit). The install directory can be found by looking in the file bin/mlkit.
The figure below shows a region profile similar to the one above except that the number of profile ticks is increased using the option -microsec 100000.
Figure 3. A region profile of the MLKit compiling the program kitkbjul9.sml. Memory is traversed once every 0.1 second.
Here is how the above figure was produced:
$ cd $HOME/mlkit-v3P/test $ ../bin/mlkit.img -realtime -microsec 100000 $HOME/mlkit-v3P/ kitkbjul9.sml ... [wrote executable file: run] $ rp2ps -region -sampleMax 2000 -eps 137 mm -name "MLKit compiling kitkbjul9 (100000 microsec)" Region profiling to output file region.ps. Using name MLKit compiling kitkbjul9 (100000 microsec). Using 2000 samples. Using encapsulated postscript with width 388 pt. $ gv -seascape region.ps