Character Syntax
Stratego -- Strategies for Program Transformation
A few months ago I proposed to introduce character literals in Stratego
as syntactic sugar for the integer ASCII value of the character. I would
like to raise this issue again ;-) .
My proposal was to introduce the widely used 'c' syntax to represent a
character in Stratego. The stratego compiler can desugar this to the
integer ASCII representation of the character, what we use to work with
characters right now. The implementation is trivial, it requires no
changes to the backend, it requires no special ATerm types.
Advantages:
- You don't need an ASCII table when your programming string manipulations in Stratego.
- String processing code like in string.r and char.r in the ssl will become much more clearer.
- It is tempting to explode string literals (even of length 1) at runtime so you don't have to look up the character codes. Character literals will improve performance in this case.
Disadvantages:
- The disadvantage Eelco Visser mentioned when I proposed it before:
what if the result of a Stratego transformation will contain characters.
The user will see integer values instead of the character literal in
this case. I think that this is not a huge problem because: (1)
characters are not likely to occur in the result of a program: they are
often used in String manipulation, where the string will be rebuild
(imploded) later. (2) In another Stratego applications the integer value
can still be matched against a pattern with a character literal because
a character literal is simply an int. (3) concrete syntax already makes
the resulting aterms more difficult to compare with the Stratego code. I
think the added distance between Stratego terms and ATerms is negligible.
- The single quote (apostrophe) is allowed in an identifier. This is a
problem if the charachter in the literal is also an allowed charachter
in an identifier. This introduces an ambiguity. However, an identifier
like 'c' is very unlikely to occur and could be rejected by the SDF grammar.
I'm currently writing strategies to rewrite XML entities and character
references. I'm using overlays right now. This is already an
improvement, but quite verbose.
------------------------
rules
unescape-amp :
[c_amp(), c_a(), c_m(), c_p(), c_semicolon() | cs] -> [c_amp() | cs]
unescape-lt :
[c_amp(), c_l(), c_t(), c_semicolon() | cs] -> [c_lt() | cs]
unescape-gt :
[c_amp(), c_g(), c_t(), c_semicolon() | cs] -> [c_gt() | cs]
overlays
c_space() = 32
c_quote() = 34
c_amp() = 38
c_apos() = 39
c_0() = 48
c_9() = 57
c_semicolon() = 59
c_numbersign() = 35
------------------------
This would be possible with characters in Stratego:
-----------------------------------
unescape-amp : ['&', 'a', 'm', 'p', ';' | cs] -> ['&' | cs]
unescape-lt : ['&', 'l', 't', ';' | cs] -> ['<' | cs]
unescape-gt : ['&', 'g', 't', ';' | cs] -> ['>' | cs]
-----------------------------------
Of course (un)escaping is an example where the usefulness of character
literals is huge. In general you won't use character literals a lot in
Stratego. Because of the simplicity of the implementation I think that
it is still worth the effort.
I would like to hear your opinion :-) .
-- Martin Bravenboer - 07 Dec 2002
Ok. I have added the following to the
SDF definition of Stratego:
----------------------------------------------------------------------
lexical syntax
"\'" CharChar "\'" -> Char
~[\'] -> CharChar
[\\] [\'ntr\ ] -> CharChar
Char -> Id {reject}
context-free syntax
Char -> Term {cons("Char")}
----------------------------------------------------------------------
and the following desugaring rules to stratego-desugar:
----------------------------------------------------------------------
Desugar :
Char(c) -> Int(i)
where <DesugarChar <+ explode-string; DesugarCharGeneric> c => i
DesugarCharGeneric :
[39, i, 39] -> i
DesugarChar :
"'\\''" -> 39
DesugarChar :
"'\\n'" -> 10
DesugarChar :
"'\\t'" -> 9
DesugarChar : // carriage return
"'\\r'" -> 13
DesugarChar : // space
"'\\ '" -> 32
----------------------------------------------------------------------
Note that the desugaring is done at the syntactic level as part of parsing.
This means that characters are pretty-printed as integers. This can be improved
later by shifting the desugaring until later in the process. This requires
deeper embedding of this notion in Stratego, though.
Are any other escapes needed? Note that this will break existing specifications
with identifiers of the form 'c' (which I have never seen).
These changes are available in
StrategoRelease09 (beta7).
--
EelcoVisser - 21 Dec 2002