ELL(k)-Parsing and Efficient ELL(1)-Parser Generation

Diploma thesis of Reinhold Heckmann

Introduction

LL-parsers for extended context-free grammars (those are grammars where the non-terminal symbols produce regular expressions instead of strings) can be generated very efficiently if the length s of the look-ahead is 1. In Chapters 1 through 12, we present the theoretical foundations of the generation algorithm which is then given in Chapters 13 through 17.
Heilbrunner [1] used another approach: he transformed extended grammars into conventional ones and generated the parser for the resulting grammar. But this parser finds the complete structure of the input under the new grammar, even though much of this structure may be unnecessary for reconstructing the parse for the original grammar.
Purdom and Brown [2] show how to generate LR-parsers directly from extended context-free grammars. Another less efficient algorithm to generate LL-parsers directly from extended grammars can be found in [3].

The theory in Chapters 1 through 12 treats the general case of an arbitrary look-ahead length. Its main results are theorems on how the first- and follow-sets (sets which are needed to generate the parser) can be deduced from each reduced extended grammar, and criteria for the existence of parsers for a given grammar. In order to understand the algorithm in Chapters 13 through 17, one must at least read Chapter 1, the definitions in Chapters 2, 3, and 4, and Chapters 6, 7, and 10.
In Chapter 1, some fundamental operations on words and languages are introduced, and their main properties are listed and proved.
Since a good theory cannot be built without an exact definition of the terms on which it operates, we give such a definition of regular expressions in Chapter 2 and of extended grammars themselves and regular derivations in Chapter 3.
An important tool to investigate a grammar is a graph associated with it which represents the relations between terminal and non-terminal symbols and the structure of the produced regular expressions. The graph is described in Chapter 4 where we also introduce some terms and abbreviations denoting diverse subsets of the vertex set, and successors and predecessors of a given vertex.
After a short introduction to fixed point theory in Chapter 5, first-sets are defined in Chapter 6, and it is shown that they are least fixed points of certain set equations.
The regular derivations of Chapter 3 are not adequate to define follow-sets. Thus, we define another kind of derivation, called K-derivation, in Chapter 7. Chapter 8 treats the correlations between regular and K-derivations.
In contrast to first sets which can also be computed for non-reduced grammars, it is necessary that the grammar be reduced when follow-sets are computed. Therefore, we state in Chapter 9 how to transform a non-reduced grammar into an equivalent reduced one. In Chapter 10, two kinds of follow-sets are defined and compared. For one of them, we prove that they form the least fixed point of a system of set equations.
Chapter 11 treats the ELL(s) conditions guaranteeing that an LL-parser with look-ahead length s exists for a given grammar, and Chapter 12 handles ambiguity and the particular case s = 1.

Chapters 13 through 17 give the algorithm to generate a parser for an ELL(1) grammar in linear time relative to the product of the number of terminals and the size of the grammar.
Chapters 13 through 15 treat the computation of first- and follow-sets without iteration: Chapter 13 gives the fundamental idea, and Chapter 14 determines the constructs which contain the empty word in their first-set. This information is needed for the actual computation in Chapter 15.
The algorithm given in Chapter 14 finds the non-terminals which derive the empty word and those which derive nothing at all at the same time. Its complexity is proportional to the size of the grammar. It may also be used independently from the task of generating an LL-parser.
In Chapter 16, we give the structure of our ELL(1) - parser which is quite different from usual ones, describe how to generate it, and prove it correct. In Chapter 17, we present some upper bounds for the time and space complexity of the parser. These upper bounds are linear in the length of the input to be analyzed.

References:

[1] S. Heilbrunner: On the definition of ELR(k) and ELL(k) grammars, Acta Informatica 11, 169 - 176 (1979)

[2] P. W. Purdom, C. A. Brown: Parsing extended LR(k) grammars, Acta Informatica 15, 115 - 127 (1981)

[3] J. Lewi, K. de Vlaminck, J. Huens, E. Steegmans: A programming methodology in compiler construction, North-Holland (1982)

Reinhold Heckmann / heckmann@absint.com