Summary of PHD thesis

In traditional compiler design the work of a compiler is divided into several phases: lexical, syntactical and semantical analysis, optimizations and code generation. For several of these phases generators exist -- most prominently LEX and YACC for generating lexical and syntactical analyzers. A common feature of all generators is that the phase in the compiler is described using a meta-language (e.g. regular expressions or context-free grammars) and that the generator produces the related compiler module. There exist several good textbooks on compiler design. However, all of these books present ready made mappings from source language constructs to target language constructs, the so called translation schemes, instead of deriving them. Hence, the reader is expected to learn how to design code generators by analysing translation schemes as opposed to from first principles. The same is true for abstract machines. Abstract machines are virtual target architectures which support the concepts of the source language. Typically abstract machines are presented together with translation schemes from the source language to the abstract machine language. There is only little work on how translation schemes and abstract machines are designed.
The aim of our work is to detect underlying principles that relate abstract machines to programming language semantics, and to automate part of the design process for abstract machines. Thus, we need to ensure that the behaviour of a source program will be maintained by translating it into the abstract machine language, and then applying the abstract machine.
The behaviour of a program will depend on it's semantics. Often this aspect of a programming language is only described in natural language which is both ambiguous and vague. We shall use formal techniques to describe the meaning of programs in a particular language and to prove that our transformations are correct. In this thesis we concentrate on natural semantics, but we also address action semantics.

Outline of Thesis

In this thesis we will give an answers to the question, how one can generate translation schemes and abstract machines. It is based on pass separating a natural semantics specification. A second answer was presented in previous work of this author. One requirement of a compiler is, that it is complete in the sense that every correct program can be compiled. The second approach is not fully automatic and does not guarantee completeness. The approach presented in this thesis both guarantees completeness and is fully automatic. As far as we know, our generator is the first running implementation of a system which fully automatically produces both compilers and abstract machines from a semantics specification.
This thesis contains

an overview of semantics formalisms:
We illustrate several semantics formalisms and discuss their appropriateness for semantics-directed compiler generation.
an overview of existing work:
Work in semantics-directed compiler generation and derivation of abstract machines is based on many formalisms and methods. We elaborate on some fundamental techniques and classify many existing systems and approaches by several criteria. Although such an overview justifies itself, it should also help the reader to classify and appreciate our work.
a generator for abstract machines based on natural semantics:
From a natural semantics specification the generator automatically produces a compiler and abstract machine. Whereas all existing semantics-directed compiler generators use partial evaluation or a direct translation into a fixed target language, we chose pass separation as the key transformation of our system.
an experimental evaluation of the generator:
We tested the generator with semantics specifications of two toy languages (SIMP and Mini-ML) and a specification of Action Notation. For Mini-ML we got an abstract machine which is very close to the CAM, an abstract machine used for efficient implementations of ML.
a correctness proof of the generator without optimizations:
First we present a semantics for our meta-language, then we use it to prove the correctness of several transformations.
an interesting application of the generator:
By generating a compiler based on a natural semantics specification of Action Notation we get an action semantics-directed compiler generator.

The main novelty of our generator is that it generates compilers and abstract machines.
The execution times of the abstract machine programs produced by our generated compiler compare to those of target programs produced by compilers generated by other semantics-directed generators. The generated specifications of compilers and abstract machines are suitable as a starting point for handwriting compilers and abstract machines. Our generator is fully automated and its core transformations are proved correct.
In our work, composing source-to-source transformations plays a central role. This divide-and-conquer approach to compiler generation has some advantages over a more direct approach:
Each transformation introduces a new property, it transforms one kind of representation into a more restricted kind. As a consequence the transformations can be developed, debugged, and replaced separately and to a certain extent each transformation can be understood and proved correct in isolation. Composing several transformations leads to a modular structure of our system. This modular structure facilitates to extend the system over time to include more powerful analysis and transformation methods.