Imagine a simple Irony grammar written in C# and capable of parsing the following string into a list of greetings:
mjallo dude howdy dudess hey "gal"
This could be written like so:
class GreetGrammar : Grammar
{
public GreetGrammar()
{
var stringlit = new StringLiteral("stringlit");
var id = new IdentifierTerminal("id");
var homergreeting = new NonTerminal("homergreeting", "mjallo" + id);
var cowboygreeting = new NonTerminal("cowboygreeting", "howdy" + id);
var greeting = new NonTerminal("greeting",
homergreeting |
"hey" + stringlit |
cowboygreeting);
var program = new NonTerminal("program",
greeting.Star());
Root = program;
}
}
There’s quite a few characters of noise in this, compared to a clean EBNF syntax. Let’s enumerate what’s distracting:
- 1. Every declaration has the name declared twice: As an identifier and again in a string.
- 2. "new NonTerminal" is repeated a lot and hinders readability.
- 3. "+" is used as a so called sequence operator, whereas BNF uses a space.
- 4. "var" is distracting.
- 5. The *, ? and + operators in EBNF are written as method calls: Star(), Q() and Plus() respectively.
If we switch to the Boo language, then with three little Boo macros, we can get rid of 1 and 2 to render this:
class BooGrammar(Grammar):
def constructor():
stringliteral stringlit
identifier id
rule cowboygreeting, "howdy" + id
rule homergreeting, "mjallo" + id
rule greeting, homergreeting | ("hey" + stringlit) | cowboygreeting
rule program, greeting.Star()
Root = program
That’s a little bit nicer, innit? Note that the var keyword is implicit in Boo, so that’s a free ride.
The only new noise added is the parenthesis around "hey" + stringLit. I believe it has something to do with a difference (between C# and Boo) in operator precedence and operator overloading. If you omit the parenthesis, the grammar compiles, but it won’t parse the input string properly.
So how about noise 3 and 5? Can we get rid of those as well?
The problem with the sequence operator (3) is that space is not an operator in either C# or Boo, so there’s nothing to overload.
Let’s look at the three operators, *, ? and +. Well in general purpose languages like C# and Boo, * and + are binary operators. In EBNF they are unary. Therefore it’s not possible to steal them. And ? is a ternary operator in C#, and it’s not an operator at all in Boo.
Still, this is a small step towards noise free executable grammars in .NET.
No comments:
Post a Comment