Inside Razor – Part 1 – Recursive Ping-Pong

This is the first of my blog posts about the parser for the new ASP.Net Razor syntax.  We’ve been working on this parser for a while now, and I want to share some of how it works with my readers!

The Razor parser is very different from the existing ASPX parser.  In fact, the ASPX parser is implemented almost entirely with Regular Expressions, because it is a very simple language to parse.  The Razor parser is actually separated into three components: 1) A Markup parser which has a basic understanding of HTML syntax, 2) A Code parser which has a basic understanding of either C# or VB and 3) A central orchestrator which understands how the two mix together.  Note that when I say “basic understanding” I mean basic, we’re not talking about full-fledged C# and HTML parsers here.  I’ve joked with people on the team that we should call them “Markup Understander” or “Code Comprehender” instead :).

So the Razor parser has three “actors”: The Core Parser, the Markup Parser and the Code Parser.  All three work together to parse a Razor document.  Now, let’s take a Razor file and do a full summary of the parsing procedure using these actors.  We’ll use the sample that I used last time:

<ul>
    @foreach(var p in Model.Products) {
    <li>@p.Name ($@p.Price)</li>
    }
</ul>

Ok, now we start at the top. The Razor parser is essentially in one of three states at any time during the parsing: Parsing a Markup Document, Parsing a Markup Block or Parsing a Code Block.  The first two are handled by the Markup Parser, and the last is handled by the Code Parser.  So, when the Core Parser is fired up for the first time, it calls into the Markup Parser and asks it to parse a Markup Document and return the result.  Now the parser is in the Markup Document state.  In this state, it simply scans forward to the next “@” character, it doesn’t care about tags or other HTML concepts, just “@”.  When it reaches an “@”, it makes a decision: “Is this a switch to code, or is it an email address?”  This decision is basically done by looking just before and just after the “@” to see if they are valid email characters.  This is the default convention, but there are escape sequences to force it to be treated as a switch to code.

In this case, when we see our first “@”, it is preceded by whitespace, which is not valid in an email address.  So, we now know we are switching to code.  The Markup Parser calls into the Code Parser and asks it to parse a Code Block.  A Block, in terms of the Razor Parser, is basically a single chunk of Code or Markup with a clear start and end sequence.  So, the ‘foreach’ statement here is an example of a Code Block.  It starts at the “f” character and ends at the “}” character.  The Code Parser knows enough about C# to know this, so it starts parsing the code.  The Code Parser does some very simple tracking of C# statements, so when it gets to the “<li>” it knows it’s at the start of a C# statement.  “<li>” is not something you can put at the start of a C# statement, so the Code Parser knows that this is the start of nested Markup Block.  So, it calls back into the Markup Parser, to have it parse a block of HTML.  This creates a sort of recursive ping-pong game between the Code and Markup parsers.  We start in Markup, then call into Code, then call into Markup and so on before finally returning back up this whole chain.  At the moment, the call stack in the parser looks something like this:

  • HtmlMarkupParser.ParseDocument()
    • CSharpCodeParser.ParseBlock()
      • HtmlMarkupParser.ParseBlock()

(Obviously, I am leaving out a lot of helper methods :)).

This highlights a fundamental difference between ASPX and Razor.  In an ASPX file, you can think of Code and Markup as two parallel streams.  You write some Markup, then you jump over and write some code, then you jump back and write some Markup, and so on.  A Razor file is like a tree.  You write some Markup, and then put some Code inside that Markup, then put some Markup inside that Code, and so on.

So, we’ve just called into the Markup Parser to parse a block of Markup, this block starts at “<li>” and ends at the matching “</li>”.  Until that matching “</li>”, we won’t consider the Markup Block finished.  So even if you had a “}” somewhere inside the “<li>” it wouldn’t terminate the “foreach”, because we haven’t come far enough up the stack yet.

While parsing the “<li>”, the Markup Parser sees more “@” characters, which means even more calls into the Code Parser. And so the call stack grows:

  • HtmlMarkupParser.ParseDocument()
    • CSharpCodeParser.ParseBlock()
      • HtmlMarkupParser.ParseBlock()
        • CSharpCodeParser.ParseBlock()

I’ll go into detail on how these blocks are terminated later, because it is a little complicated, but eventually we finish these code blocks and we’re back in the “<li>” block.  Then, we see “</li>” so we finish that block and pop back up to the “foreach” block.  The “}” terminates that block, so we back up to the top of our stack again: the Markup Document.  Then we read until the end of the file, not finding anymore “@” characters.  And we’re done!  We’ve parsed the entire file!

I hope that’s made the general structure of the parsing algorithm somewhat more clear.  The key take-away here is to avoid thinking of Code and Markup as separate streams and think of them as constructs you nest inside each other.  Our next topic will be Implicit Expressions, which is the logic that allows us to detect what parts of “@p.Name ($@p.Price)” are code, and what are markup.  I’ll give you a hint, we took some inspiration from PowerShell here ;).

Please post any questions or comments in the comments section or email me at “andrew AT andrewnurse DOT net”!

Monday, July 05, 2010 1:02:50 PM (Pacific Standard Time, UTC-08:00)
How does the parser handle the following situations?
- I'm @robertmclaws on Twitter.
- Meet me @ 7pm tonight.
Monday, July 05, 2010 1:18:12 PM (Pacific Standard Time, UTC-08:00)
Does all HTML have to be well formed in the XML sense?
RichB
Monday, July 05, 2010 9:58:17 PM (Pacific Standard Time, UTC-08:00)
So are there theoretical performance issues with deep nesting of markup/code/markup/code/markup/code/... ?
Either way I'm quite liking how your team is handling views and view engines, providing options and giving us another great one w/Razor!
Monday, July 05, 2010 10:08:27 PM (Pacific Standard Time, UTC-08:00)
@Robert: My guess would be that the code parser checks if the word behind the @-sign is a C#/VB keyword (foreach, if, ...) or a property of the model.

Just a guess though :-) I'm curious what Andrew has to say about this.
Tuesday, July 06, 2010 9:14:33 AM (Pacific Standard Time, UTC-08:00)
@Robert McLaws/Kristof - You would have to escape these "@" characters as "@@". It's a little annoying, yes, but a lot of this kind of content comes from a dynamic source anyway. We can't be quite as smart as Kristof suggests because "robertmclaws" could be a variable you are trying to render the value of. Our "parser" doesn't actually parse C#, it just scans it, so it doesn't know if that variable has been defined.

@RichB - Short answer is Yes. Long answer is that it can be malformed in some places, but the outermost tag in a Markup Block ("<li>" in my examples) must a) have a matching close tag and b) be well-formed XML. Details to follow later

@TJB - Our QA guys have done stress tests with pages that have ridiculously large degrees of nesting and seen no major performance problems. We'll be tuning for performance later, but I don't see it being an issue.

Glad to hear all this great feedback about Razor!
Tuesday, July 06, 2010 10:15:51 PM (Pacific Standard Time, UTC-08:00)
I vote to keep the parser simple for several reasons: first it keep thing simple --> less bug

it doesn't matter how much we are trying to infer the programmer intention, there will always be edge case. For example even if the parser can figure out that "robertmclaws" is not a variable it could mean that or it could mean that we got a misspell somewhere.

I think it's better to define a clear rule so everyone can follow it. So far Razor is doing a great job of keeping thing simple rather than trying to outsmart the programmer.
firefly
Wednesday, July 07, 2010 3:53:57 PM (Pacific Standard Time, UTC-08:00)
Hi Andrew, great post! It's funny you should post this today, because I have been investigating yesterday and when I searched in Google nothing showed up.

I have done a post in which I explain how I did to run Razor from a Console App http://thegsharp.wordpress.com/2010/07/07/using-razor-from-a-console-application, having your comments about it would be greatly appreciated it.

Keep up the terrific job, I'll be waiting for the rest of your posts.
Gus
Thursday, July 08, 2010 12:12:53 AM (Pacific Standard Time, UTC-08:00)
looks to me like good old asp is back with a bang! and people have started understanding simplicity and speed of spaghetti code! long live razor! a wonderful thing to get on the .net bandwagon for the asp programmers confused by the over engineered asp.net.

Love,
Jack.
Jack
Thursday, July 08, 2010 4:29:58 AM (Pacific Standard Time, UTC-08:00)
<blockquote cite="Andrew">"the outermost tag in a Markup Block ... must ... have a matching close tag"</blockquote>

So you don't support, for example:

@if (condition) {
<div class="special">
}
...
@if (condition) {
</div>
}

You'd have to write this as:

@if (condition) {
<div class="special">
...
</div>
} else {
...
}

where everything in the "..." block was repeated?


Also, I know Microsoft prefer to have the opening brace on the same line as the statement, but a lot of people prefer to have it on a new line. Will Razor support this syntax?

@if (condition)
{
...
}
RichardD
Thursday, July 08, 2010 6:55:47 AM (Pacific Standard Time, UTC-08:00)
@Jack - That's basically our goal, bring back the classic ASP model but provide better on-ramps to more professional models like WebForms and MVC.

@RichardD - We are investigating how better to support that kind of scenario, most likely by changing the behavior of the "<text>" tag (which DOES NOT support the scenario you described right now, FYI). There are, however, two alternatives:

@if(condition) {
@:<div class="special">
}
blah blah blah
@if(condition) {
@:</div>
}

"@:" is a construct which indicates that the rest of the line is markup, regardless of tags. It is documented, but wasn't talked about much in the flurry of blogs :). Also, if you're ok with having a "div" if condition is false, it's just the class that changes:

<div class="@(condition ? "special" : "not-special")">
blah blah blah
</div>

To answer your second question, while Razor doesn't "parse" C#, it does understand it enough to follow it's whitespace rules. I'll go over it in more detail but essentially: Razor is NOT line-based, it follows the statement rules of the code language. So in C#, you can put the braces on the same line, on the next line, or even five lines down, as long as it is legal in C#.
Thursday, July 08, 2010 8:25:52 AM (Pacific Standard Time, UTC-08:00)
OK, just to play devil's advocate for a minute!

Assume I don't want the <div> tag unless the condition is met, and that I want a unique ID for each tag. Will the "@:" prefix work?

@if (condition) {
@:<div class="special" id="special-item-@counter++">
}
...
@if (condition) {
@:</div>
}

Will this output "special-item-1", "special-item-2" etc., or will they all be "special-item-@counter++"?
RichardD
Friday, July 09, 2010 11:49:16 PM (Pacific Standard Time, UTC-08:00)
That will work with one minor adjustment: "@counter++" has to be written "@(counter++)" since our implicit expressions don't support "++" because it becomes difficult to be absolutely unambiguously sure that you meant to use the increment operator and not to render "++" afterward the expression.

"@:" means that the rest of the line is a markup block, but just like a markup block started with a tag, you can nest code within it.
Saturday, July 10, 2010 7:29:26 AM (Pacific Standard Time, UTC-08:00)
What about this js/xml example :
function GetName()
{
var name = xmlDoc.selectSingleNode("/Person/@Name@NameSufix");
}

* The @NameSufix is the ViewModel property.
* The @Name is the xml attribute
igor609
Monday, July 12, 2010 4:30:54 AM (Pacific Standard Time, UTC-08:00)
Just guessing - would that be:

var name = xmlDoc.selectSingleNode("/Person/@@Name@(NameSufix)");

All these special cases and escape characters are starting to make the old <%= ... %> look more appealing!
RichardD
Name
E-mail
(will show your gravatar icon)
Home page

Comment (Some html is allowed: a@href@title, strike) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview