Monday, April 21, 2008

Grammar noise cancellation

Imagine a simple Irony grammar written in C# and capable of parsing the following string into a list of greetings:

mjallo dude howdy dudess hey "gal"

This could be written like so:

class GreetGrammar : Grammar
{
 public GreetGrammar()
 {
   var stringlit = new StringLiteral("stringlit");
   var id = new IdentifierTerminal("id");
   var homergreeting = new NonTerminal("homergreeting", "mjallo" + id);
   var cowboygreeting = new NonTerminal("cowboygreeting", "howdy" + id);
   var greeting = new NonTerminal("greeting",
                          homergreeting |
                          "hey" + stringlit |
                          cowboygreeting);
   var program = new NonTerminal("program",
                          greeting.Star());
   Root = program;
 }
}

There’s quite a few characters of noise in this, compared to a clean EBNF syntax. Let’s enumerate what’s distracting:

  1. 1. Every declaration has the name declared twice: As an identifier and again in a string.
  2. 2. "new NonTerminal" is repeated a lot and hinders readability.
  3. 3. "+" is used as a so called sequence operator, whereas BNF uses a space.
  4. 4. "var" is distracting.
  5. 5. The *, ? and + operators in EBNF are written as method calls: Star(), Q() and Plus() respectively.

If we switch to the Boo language, then with three little Boo macros, we can get rid of 1 and 2 to render this:

class BooGrammar(Grammar):
  def constructor():
     stringliteral stringlit
     identifier id
     rule cowboygreeting, "howdy" + id
     rule homergreeting, "mjallo" + id
     rule greeting, homergreeting | ("hey" + stringlit) | cowboygreeting
     rule program, greeting.Star()
     Root = program 

That’s a little bit nicer, innit? Note that the var keyword is implicit in Boo, so that’s a free ride.

The only new noise added is the parenthesis around "hey" + stringLit. I believe it has something to do with a difference (between C# and Boo) in operator precedence and operator overloading. If you omit the parenthesis, the grammar compiles, but it won’t parse the input string properly.

So how about noise 3 and 5? Can we get rid of those as well?

The problem with the sequence operator (3) is that space is not an operator in either C# or Boo, so there’s nothing to overload.

Let’s look at the three operators, *, ? and +. Well in general purpose languages like C# and Boo, * and + are binary operators. In EBNF they are unary. Therefore it’s not possible to steal them. And ? is a ternary operator in C#, and it’s not an operator at all in Boo.

Still, this is a small step towards noise free executable grammars in .NET.

Thursday, April 17, 2008

Looking for an open source text editor component with syntax highlighting

I’m trying to give a little back to an open source project, namely Irony that I blogged about. My idea was this: We’re getting used to intellisense features in mainstream .NET languages such as syntax coloring, error highlighting, code completion, signature tool tips and more. When writing a DSL to be used and understood by business experts and developers, those things are not readily available.

The benefit of writing an add-in for Visual Studio that allows intellisense for DSLs parsed with Irony is big. Writing a managed language service for Visual Studio is no simple task. I want to avoid the complexity of the Visual Studio domain model and keep it simple using a stand alone text editor component. So I went searching…

There had to be lots of open source syntax highlighting text editors out there, I thought. And there is. I found a couple of well written candidates like SharpDevelop IDE TextEditor, the xacc  IDE editor and SyntaxBox from the Puzzle Framework. Finally, there are quite a few commercial offerings, but introducing a dependency to a commercial product in an open source project is not the road to popularity.

Both SyntaxBox and SharpDevelop let you extend the built-in syntax schemes with your own. This is done  by writing an XML based grammar. It’s possible to do and I have done it a couple of times. But having already written my grammar once using Irony, it just doesn’t feel right having to do it over again — this time using a different syntax.

Running an Irony-generated parser gives as output not only an AST, but also a list of tokens. If I could feed that list of tokens to the editor and bypass the built in lexing mechanism, I’d be laughing now. But it’s not that easy.

The problems with the three components that I tried using, fell into three categories:

  • The editor is entangled in references to other parts of the project, making it impossible to reuse it in other projects like Irony.
  • The built in lexer cannot by turned off, so lexing is being done twice, and two sets of formats for each token have to be merged — resulting in awfully bad performance.
  • The extension points are leaky abstractions that requires you to know of the inner workings of the text rendering.

Not unsolvable issues, but issues that result in poor maintainability and hacks en masse. Plus it takes a lot more (spare)time than I have.

Do you know of any open source text editor component that supports syntax highlighting AND lets me replace the tokenizer/lexer with something else?

2 Responses to 'Looking for an open source text editor component with syntax highlighting'

Subscribe to comments with RSS or TrackBack to 'Looking for an open source text editor component with syntax highlighting'.

  1. Daniel Grunwald said,

    ON APRIL 18TH, 2008 AT 11:38 AM

    You can supply our own implementation of IHighlightingStrategy for SharpDevelop’s text editor and consume your own tokens in that. The interface isn’t as clean as it should be (at least in SharpDevelop 2.x), but it’s certainly possible. I would like to hear what problems you had exactly with SharpDevelop’s editor.

  2. Soren said,

    ON APRIL 19TH, 2008 AT 8:41 PM

    Daniel,

    the #dev editor is a great editor, and I really hope you will help me get this scenario working.

    Having failed several attempts to make a custom IHighlightingStrategy work, Finally, I tried cutting to the bone by implementing the simplest possible scenario: Not tampering LineSegment.Words at all within MarkTokens(). I expected the text to be all black and otherwise work normally. But the editor starts behaving weird and it’s not possible to enter or edit text, because it gets cut off at a fixed column.

    Try downloading http://skarpt.dk/blog/SDTextEditorTest.zip. It references ICSharpCode.TextEditor.dll version 3.0.0.2970.

    Looking at the default implementaion I get a feeling that MarkTokens needs to be doing something else, something that is not obvious from the interface definition?

Wednesday, April 16, 2008

No reason to laugh at Irony

Being a DSL geek I just want to give a shout out to Roman Ivantsovs Irony project. Irony is a LALR parser generator that lets you write your grammar in C#. If you’re looking to build your own little Domain Specific Language, Irony is a good alternative to ANTLR andGold Parser Builder.

A grammar written in C# with Irony looks as much like EBNF notation as possible given the constraints that writing them in C# imposes. For example, this definition of three non terminals in EBNF:

Expr  ::=  n | v | Expr BinOp Expr | UnOp Expr | '(' Expr ')'
BinOP ::=  '+' | '-' | '*' | '/' | '**'
UnOp  ::=  '-'

…translates into this C# code for an Irony grammar:

Expr.Rule = n | v | Expr + BinOp + Expr |
           UnOp + Expr | "(" + Expr + ")";
BinOp.Rule = Symbol("+") | "-" | "*" | "/" | "**";
UnOp.Rule = Symbol("-"); 

(Add to this a line for declaring each of the three non terminal variables - I left them out to prove a point ;-)

The syntax of the C# and the ANTLR versions are impressively similar, though there still is some noise left. It’s not as pretty as what Ghilad Bracha can do in Newspeak, but then again, C# wins by being a language that is actually used by developers.

Gold Parser Builder and ANTLR may be more mature than Irony, but what I like the most about the latter is the fact that the grammar is compiled along with the application that uses the grammar to parse DSL code. That means less switching between Visual Studio and ANTLR og Gold. Less fuss. Shorter path from language design to language test to language use.

The abstract syntax three (AST) that my Irony generated parser makes is nice and clean. It can even filter out punctuation characters, so that for instance parenthesis do not make up nodes themselves, only the expression within the parenthesis is a node in the tree.

That’s it for the extremely brief introduction to Irony. You can check out some more detailed samples by downloading. Irony is definitely a part of my toolbox so stay tuned for more on Irony.

One Response to 'No reason to laugh at Irony'

Subscribe to comments with RSS or TrackBack to 'No reason to laugh at Irony'.

  1. Soren On Software » Looking for an open source text editor component with syntax highlighting said,

    ON APRIL 17TH, 2008 AT 10:27 PM

    […] trying to give a little back to an open source project, namely Irony that I blogged about. My idea was this: We’re getting used to intellisense features in mainstream .NET languages […]

Bluefield projects

 

The terms greenfield site and brownfield site have their origin in urban planning. Greenfield means using fresh farm land for building projects. Brownfield means reusing existing industrial sites for new purposes.

A greenfield software project is a fresh start project that lets you make design decisions without regards to an existing codebase. Productivity is high, as you are not bound by an existing code base. A brown field project is the opposite: Maintaining legacy code. Productivity is low. Working on a brown field project often feels like dragging through mud.

Here are two paradoxes:

  • Brownfield projects are more common than green field projects. All the while most developers treasure greenfield more than brownfield.
  • The computer science schools spend more time teaching greenfield-related techniques than brownfield. Meanwhile, a greenfield project may start green but before long it becomes brown. Even the code you wrote yourself will seem unintelligible in 6 months.

Most projects that I participate in have a color somwwhere between green and brown. Often there is a legacy app written in VB6 and it has to be rewritten in C#. The existing app is considered the blueprint and all of its functionality must be duplicated in the new app. Plus all the new features that the customer expects to be thrown in while we’re at it.

These kind of projects requires you to reuse some legacy code and allows you to make some greenfield decisions.

How to color the projects? Black field project? Too sinister. Blue field? Well the symbolism is not as clear as the terms green and brown. What’s your favorite color?