Note 4, Programming Language Concepts (sestoft@dina.kvl.dk) 2002-02-25 ---------------------------------------------------------------------- General environment implementations ----------------------------------- For brevity we shall use a general environment representation, given in structure Env. Here is the signature of Env from file Env.sig: type ('key, 'data) env val empty : (''key, 'data) env val lookup : (''key, 'data) env -> ''key -> 'data val bind1 : (''key, 'data) env -> ''key * 'data -> (''key, 'data) env val bind : (''key, 'data) env -> (''key * 'data) list -> (''key, 'data) env val bindZip : (''key, 'data) env -> ''key list * 'data list -> (''key, 'data) env val fromList : (''key * 'data) list -> (''key, 'data) env val plus : (''key, 'data) env * (''key, 'data) env -> (''key, 'data) env The type Env.env is parametrized by two types: the type of key, and the type of data. For example, an environment mapping variable names to integers will have type (string, int) env The value Env.empty is the empty environment. The call (lookup env x) looks up x in environment env. The call (bind1 env (x, v)) returns a new environment which is env extended with a binding of x to v. The call (bind env [(x1, v1), ..., (xn, vn)]) returns a new environment which is env extended with a binding of x1 to v1 and so on. The call (bind env ([x1, ..., xn], [v1, ..., vn])) returns a new environment which is env extended with a binding of x1 to v1 and so on. The call (fromList [(x1, v1), ..., (xn, vn)]) returns an environment in which x1 is bound to v1 and so on. The call (plus env1 env2) returns a new environment which is env1 updated with the bindings from env2. Thus if y is bound in both environments, then the binding from env2 hides that from env1. The Env signature and structure are in files Env.sig and Env.sml. A first-order functional language with nested function declarations ------------------------------------------------------------------- The following files are provided fun/Absyn.sml the abstract syntax (shown below) fun/grammar.txt an informal grammar fun/Funlex.lex lexer specification fun/Funpar.grm parser specification fun/parse.sml a parser fun/fun.sml a first-order evaluator for expr fun/tyfun.sml an explicitly typed version of expr Our new first-order functional language extends the simple expression language with if-then-else expressions, function bindings, function calls, and sequential logical connectives (`and' and `or'): datatype expr = CstI of int | CstB of bool | Var of string | Let of string * expr * expr | Prim of string * expr list | If of expr * expr * expr | Letfun of string * string * expr * expr (* (f, x, fbody, ebody) *) | Call of expr * expr The components of a function binding Letfun (f, xs, fbody, ebody) have the following meaning: f is the function name x is the parameter name fbody is the function body, or right-hand side ebody is the let-body For now, we restrict the language to be first-order, so that in a function call f(e), f must be a function name, although the above abstract syntax allows f to be an arbitrary expression. We enforce the first-order restriction by requiring that in a function call Call(e, es), the function expression e must be a (variable) name. So for now, all function calls have the form Call(Var f, es) where f is a function name. Thus the concrete syntax let f x = x + 7 in f 2 end could be represented as Letfun("f", "x", Prim(...), Call(Var "f", CstI 2)) Note that the language admits nested function declarations, like this: let f x = let g y = x + y in g (2 * x) end in f 7 end We can interpret this language (without an explicit evaluation stack) using an environment env which maps variable names to integers and function names to function closures. Function closures ----------------- A (recursive) function closure is a tuple (f, x, fbody, decenv) consisting of the name of the function, the name of the function's parameter, the function's body expression, and the function's declaration environment. The latter is needed because a function declaration may have free variables. For instance, x is free in the declaration of function g above (but y is not free, as it is bound as a parameter to g). Thus the closures created for f and g above would be ("f", "x", "let g y = x + y in g (2 * x) end", []) and ("g", "y", "x + y", [("x", 7)]) The name of the function is included in the closure to allow the function to call itself recursively. In the eval interpreter, a recursive closure is a value RClo(f, x, fbody, decenv) where f is the function name, x is the parameter name, fbody is the function body, and decenv is the environment in which the function was declared: this is the environment in which fbody should be evaluated. Since we do not really distinguish variable names from function names, the interpreter will use a single variable-and-function environment vfenv. Such an environment maps a name (of type string) to a value, where a value is an integer or a recursive closure: datatype value = Int of int | RClo of string * string * expr * vfenv withtype vfenv = (string, value) env The definitions of the value types and the variable-and-function environment type are mutually dependent: a value can include an environment, and an environment maps names to values. Type-checking of an explicitly typed functional language -------------------------------------------------------- We extend our first-order functional language with explicit types on function declarations, describing the types of function parameters and function results (as we are used to in Java, ANSI C, C++, Ada, Pascal and so on). We need a (meta-language) type typ of object-language types. The types are int and boolean (and function types, for use when checking a higher-order functional language): datatype typ = TypI (* int *) | TypB (* bool *) | TypF of typ * typ (* (argumenttype, resulttype) *) The abstract syntax of the object language is as before, except that types have been added to Letfun bindings (file fun/tychk.sml): datatype tyexpr = CstI of int | CstB of bool | Var of string | Let of string * tyexpr * tyexpr | Prim of string * tyexpr list | If of tyexpr * tyexpr * tyexpr | Letfun of string * string * typ * tyexpr * typ * tyexpr (* (f, x, xty, fbody, rty, ebody *) | Call of tyexpr * tyexpr A type checker for this language maintains a type environment type tyenv = (string, typ) env which maps bound variables and function names to their types. The type checker analyses the given expression, and returns its type. For constants it simply returns the type of the constant. For variables, it uses the type environment: fun typ (env : tyenv) (e : tyexpr) : typ = case e of CstI i => TypI | CstB b => TypB | Var x => lookup env x | ... For a primitive operator such as addition (+), less than (<), logical and (&), and so on, the type checker recursively finds the types of the arguments, check that they are as expected, and returns the type of the expression: fun typ (env : tyenv) (e : tyexpr) : typ = ... | Prim(ope, [e1, e2]) => let fun chk ta tb tr = if typ env e1=ta andalso typ env e2=tb then tr else raise Type "Prim" in case ope of "*" => chk TypI TypI TypI | "+" => chk TypI TypI TypI | "-" => chk TypI TypI TypI | "=" => chk TypI TypI TypB | "<" => chk TypI TypI TypB | "&" => chk TypB TypB TypB | _ => raise Fail "unknown primitive" end | ... For a let-binding let x = erhs in ebody end the type checker recursively finds the type xty of the right-hand side erhs, then binds x to xty in the type environment, and then finds the type of the ebody; this is the type the entire let-expression: fun typ (env : tyenv) (e : tyexpr) : typ = ... | Let(x, erhs, ebody) => let val xty = typ env erhs val env1 = bind1 env (x, xty) in typ env1 ebody end | ... For a function declaration let f (x : xty) = fbody : rty in ebody end the type checker recursively finds the type of the function body fbody under the assumption that x has type xty and f has type xty -> rty, and checks that the type it found is actually rty. Then it finds the type of ebody under the assumption that f has type xty -> rty: fun typ (env : tyenv) (e : tyexpr) : typ = ... | Letfun(f, x, xty, fbody, rty, ebody) => let val env1 = bind1 env (f, TypF(xty, rty)) val env2 = bind1 env1 (x, xty) in if typ env2 fbody = rty then typ env1 ebody else raise Type ("Letfun: wrong type given " ^ f) end | ... For a function call f e the type checker recursively finds the type of e, and checks that it is actually xty, where the type of f is xty -> rty. If so, the type of the application is rty: fun typ (env : tyenv) (e : tyexpr) : typ = ... | Call(e, earg) => (case typ env e of TypF(xty, fty) => if typ env earg = xty then fty else raise Type "Call: wrong argument type" | _ => raise Type "Call: attempt to apply non-function") This approach suffices because function declarations are explicitly typed: there is no need to guess the type of function parameters or function results. We shall see later that one can in fact systematically `guess' and then verify types, thus doing type inference rather than type checking. Static typing versus dynamic typing ----------------------------------- Our original untyped functional language is not completely untyped. More precisely it is dynamically typed: it forbids certain monstrosities, such as adding a function and an integer. Hence this program is illegal, and its execution fails: let f x = x+1 in f + 4 end whereas this program is perfectly valid in the original interpreter: let f x = x+1 in if 1=1 then 3 else f + 4 end It evaluates to 3 without any problems, because no attempt is made to evaluate the then-branch of the if-then-else. By contrast, our typed functional language (tyexpr in fun/tychk.sml) is statically typed: a program such as if 1=1 then 3 else false+4 in tyexpr abstract syntax If(Prim("=", [CstI 1, CstI 1]), CstI 3, Prim("+", [CstB false, CstI 4])) is ill-typed even though we never attempt to evaluate the addition false+4. Thus the type checker in a statically typed language may be overly pessimistic. Even so, most modern languages are statically typed, for several reasons. First, type errors often reflect logic errors, so static (compile-time) type checking helps finding real bugs early. It is better and cheaper to detect and fix bugs at compile-time than at run-time (which may be after the program has been shipped to customers). Secondly, types provide reliable machine-checked documentation, to the human reader, about the intended and legal ways to use a variable or function. Finally, the more the compiler knows about the program, the better code can it generate: types provide such information to the compiler, and advanced compilers use type information generate target programs that are faster or use less space. Languages such as Lisp, Scheme, ECMAScript/Javascript/Flash Actionscript, Postscript, and Visual Basic are dynamically typed. Although most of Java and C# is statically typed, some parts are not. In particular, array element assignment and operations on collection classes. Dynamic typing in Java and C# reference array assignment -------------------------------------------------------- In Java and C#, assignment to an array element is dynamically typed when the array element type is a reference type. Namely, recall that the `wrapper' classes Integer and Double are subclasses of Number, where Integer, Double, and Number are built-in classes in Java. If we create an array whose element type is Integer, we can bind that to a variable arrn of type Number[]: Integer[] arr = new Integer[16]; Number[] arrn = arr; Note that arr and arrn refer to the same array, whose element type is Integer. Now one might believe (mistakenly), that when arrn has type Number[], one can store a value of any subtype of Number in arrn. But that would be wrong: if we could store a Double in arrn, then an access arr[0] to arr could return a Double object, which would be rather surprising, given that arr has type Integer[]. However, in general a variable arrn of type Number[] *might* refer to an array whose element type is Double, in which can we *can* store a Double in the array. So the Java compiler should not refuse to compile such an assignment. The end result is that the Java compiler will actually compile this assignment arrn[0] = new Double(3.14); without any complaints, but when it is executed at runtime, it is checked that the element type of arrn is Double or a superclass of Double, which it is not, and an ArrayAssignmentException is thrown. Hence Java array assignments are not statically typed, but dynamically typed. Dynamic typing in Java and C# collection classes ------------------------------------------------ When we use collection classes, Java and C# provides no compiletime type safety: LinkedList names = new LinkedList(); names.add(new Person("Kristen")); names.add(new Person("Bjarne")); names.add(new Integer(1998)); // (1) Wrong, but no compiletime check names.add(new Person("Anders")); ... Person p = (Person)names.get(2); // Cast needed, may fail at runtime The elements of the LinkedList are supposed to have class Person, but the Java compiler has no way of knowing that; it must assume that all elements are of class Person. This has two consequences: when storing something into the list, the compiler cannot detect mistakes (line 1); and when retrieving an element from the list, it must be checked *at runtime* that the element has the desired type (line 2). History and literature ---------------------- Functional, mostly expression-based, programming languages go back to Lisp (McCarthy 1960). Lisp is dynamically typed and has dynamic variable scope, but one of its successor languages, Scheme (Sussman and Steele 1975) has static scope, which admits a much better implementation. Like Lisp, Scheme is dynamically typed, but there are many subsequent statically typed functional languages, notably ML (Gordon et al 1978), Standard ML (Milner, Tofte, Harper 1990), OCaml (Leroy 1995). Whereas these languages have so-called strict or eager evaluation -- function arguments are evaluated before the function is called -- another subfamily is made up of the so-called non-strict or lazy functional languages, including SASL and it successor Miranda (Turner 1985), Lazy ML (Augustsson and Johnsson 1984), Haskell (Peyton Jones and Hughes 1998). All the statically typed languages are statically scoped as well. Probably the first published description of type checking in a compiler is about the Algol 60 compilers developed at Regnecentralen in Copenhagen: Peter Naur. Checking of Operand Types in ALGOL Compilers, BIT 5 (1965) 151-163. More general forms of static analysis or static checking have been studied under the name of data flow analysis (Kam and Ullman 1977), or control flow analysis, or abstract interpretation (Cousot and Cousot 1977), and in much subsequent work.