Island Grammars
Program-Transformation.Org: The Program Transformation Wiki
Description
An
island grammar only precisely defines small portions of the syntax of a language. The rest of the syntax is defined imprecisely, for instance as a list of characters, or a list of tokens.
- In BuildingDocumentationGenerators, we have described an Island Grammar as consisting of
- detailed productions for the language constructs we are specifically interested in
- liberal productions catching all remaining constructs; and
- a minimal set of general definitions covering the overall structure of a program. -- ArieVanDeursen
Examples
From
AQuickIntroductionToSDF:
module Message
exports
syntax
Msg -> <START>
X* -> Msg
~[] -> X {avoid}
module Mailaddress
exports
syntax
Segment "@" Hostname -> X
[A-Za-z][A-Za-z0-9\-\_]* -> Segment
{Segment "."}+ -> Hostname
Thus, this grammar gives a precise syntax definition only for mail addresses (the
islands), and describes the rest of a message as a list of arbitrary characters (the
sea).
Purpose
The purposes of
IslandGrammars are the following:
- Faster parsing: the grammar is smaller, and contains less (local) ambiguities (hopefully).
- More robust parsing: an island grammar is less dependent of language changes or differences between dialects than a regular, precise one.
- Ignorant parsing: full knowledge of the syntax of a given language is not required to parse it.
Discussion
I have some quesions:
- Does anyone have more examples of IslandGrammars?
- Is there evidence that parsing with IslandGrammars can indeed be faster than with regular grammars?
- How about LakeGrammars? or SkeletonGrammars?: grammars where the "top" of the syntax is defined precisely, while some lower non-terminals are defined imprecisely. For instance, a precise Yacc grammar where the semantic actions (C code) are defined imprecisely?
--JoostVisser
I've occasionally done a skeleton grammar, usually for language conversion. It's just an ordinary grammar, byt the lexer recognises
an extra token, usually called
stuff which matches any token or other junk that does not occur in the grammar. Very useful for things
like converting between control-statements that have end-markers and ones that don't, for everting C declarations, etc. It gets most of the boring bulk editing done, leaving you to do the changes that need actual brainwork
--
HendrikBoom
ErnstJanVerhoeven?'s MSc thesis contains several examples of
IslandGrammars in
SDFII, which are mostly related to
COBOL analysis for
DocGen.
The current distribution of
DocGen also includes an SQL island grammar in
JavaCC. It only extracts the basic SQL commands (insert, delete, update, select, cursor declarations, ...) but essentially ignores expressions. The implementation in
JavaCC (done out of portability reasons) is quite a challenge, as it is an LL1 parser generator. --
ArieVanDeursen
LeonMoonen as a paper on Island Grammars at
WCRE 2001, see
http://www.cwi.nl/~leon/papers/wcre2001/
CategoryReverseEngineering | Contributions by
ArieVanDeursen,
JoostVisser,
HendrikBoom