Regular Tree Grammar

Stratego -- Strategies for Program Transformation
A regular tree grammar defines a regular tree language. Regular tree grammars are widely applied as tools in formal reasoning, but in practice the basic formalism is hidden in languages designed at human users. The RTG language is a little language that just contains the constructs of the clean formalism. The language is probably best introduced by a tiny example:

regular tree grammar
start Exp
  Var -> Var   (<string>)
  Exp -> Int   (<int>)
       | Times (Exp, Exp)
       | Plus  (Exp, Exp)
       | Call  (Var, ExpList)

  ExpList -> <cons> (Exp, ExpList)
       | <nil> ()

A regular tree grammar consists of a list of start non-terminals and a list of productions. The list of start non-terminals defines what non-terminals are allowed at the root of tree in the language defined by this RTG. The productions specify the terms that can be produced for a certain non-terminal. The left-hand-side of a production is non-terminal, the right hand side is a term that can contain non-terminals.

Formally, a regular tree grammar consists of:

  • An unranked alphabet of non-terminals: N
  • A ranked alphabet of terminals: T
  • A set of start symbols S, which is a subset of N
  • A set of productions of the form N -> t, where t is an element of the set of ground terms over T union N.

For more formal discussion on tree grammars and tree languages see Tree Automata Techniques and Applications? or Connecting XML Processing and Term Rewriting with Tree Grammars.

Notice that the regular tree grammar shown above contains a few constructs between angle brackets: <cons>, <nil>, <string>, and <int>. The first two are built-in terminals for representing lists. The last two are built-in non-terminals for strings and integers.

See also: