Summary of PHD thesis
In traditional compiler design the work of a compiler is
divided into several phases: lexical, syntactical and
semantical analysis, optimizations and code generation.
For several of these phases generators exist -- most
prominently LEX and YACC for generating lexical
and syntactical analyzers. A common feature of all generators
is that the phase in the compiler is described using a meta-language
(e.g. regular expressions or context-free grammars) and that the
generator produces the related compiler module. There exist
several good textbooks on compiler design.
However, all of these books present ready made mappings from
source language constructs to target language constructs,
the so called translation schemes, instead of deriving them.
Hence, the reader is expected to learn how to design code
generators by analysing translation schemes as opposed
to from first principles.
The same is true for abstract machines.
Abstract machines are
virtual target architectures which support the concepts
of the source language.
Typically abstract machines are presented together with translation
schemes from the source language to the abstract machine
language. There is only little work on how
translation schemes and abstract machines are designed.
The aim of our work is to detect underlying principles that relate
abstract machines to programming language semantics,
and to automate part of the design process for
abstract machines.
Thus, we need to ensure that the behaviour of a source program
will be maintained by translating it into
the abstract machine language, and then applying the abstract
machine.
The behaviour of a program will depend on it's semantics. Often this
aspect of a programming language is only described in natural language
which is both ambiguous and vague.
We shall use formal techniques to describe the
meaning of programs in a particular language and to
prove that our transformations are correct.
In this thesis we concentrate on natural semantics, but we also address
action semantics.
Outline of Thesis
In this thesis we will give an answers
to the question, how one can generate translation
schemes and abstract machines.
It is based on pass separating a natural semantics
specification.
A second answer was presented in previous work of this
author.
One requirement of a compiler is, that it is complete in
the sense that every correct program can be compiled. The second
approach is not fully automatic and does not guarantee completeness.
The approach presented in this thesis
both guarantees completeness
and is fully automatic. As far as we know, our generator
is the first running implementation of a system
which fully automatically produces both compilers and abstract machines
from a semantics specification.
This thesis contains
- an overview of semantics formalisms:
We illustrate several semantics
formalisms and discuss their appropriateness for
semantics-directed compiler generation.
- an overview of existing work:
Work in semantics-directed compiler generation
and derivation of abstract machines is based
on many formalisms and methods.
We elaborate on some fundamental techniques
and classify many existing systems and approaches
by several criteria. Although such an overview justifies
itself, it should also help the reader to classify
and appreciate our work.
- a generator for abstract machines based on natural semantics:
From a natural semantics specification the generator
automatically produces a compiler and abstract machine.
Whereas all existing semantics-directed compiler generators
use partial evaluation or a direct translation into
a fixed target language, we chose pass separation as the
key transformation of our system.
- an experimental evaluation of the generator:
We tested the generator with semantics specifications
of two toy languages (SIMP and Mini-ML) and a specification of
Action Notation. For Mini-ML we got an abstract machine
which is very close to the CAM, an abstract machine
used for efficient implementations of ML.
- a correctness proof of the generator without optimizations:
First we present a semantics for our meta-language, then
we use it to prove the correctness of several transformations.
- an interesting application of the generator:
By generating a compiler based on a natural semantics
specification of Action Notation we
get an action semantics-directed compiler generator.
The main novelty of our generator is that it
generates compilers and abstract machines.
The execution times of the abstract machine programs
produced by our generated compiler
compare to those of target programs
produced by compilers generated by other
semantics-directed generators.
The generated specifications of compilers and
abstract machines are suitable as a starting point
for handwriting compilers and abstract machines.
Our generator is fully automated
and its core transformations are proved correct.
In our work,
composing
source-to-source transformations plays
a central role. This divide-and-conquer approach to compiler generation
has some advantages over a more direct approach:
Each transformation introduces
a new property, it transforms one kind of representation
into a more restricted kind. As a consequence the
transformations can be developed, debugged, and
replaced separately and to a certain extent each transformation
can be understood and proved correct in isolation.
Composing several transformations leads to a modular structure
of our system. This modular structure facilitates to extend
the system over time to include more powerful analysis
and transformation methods.