Wild Ideas – Using lambdas to check arguments in C#

One of the most common checks I do is a simple null check on arguments being passed in to the methods I write.  I usually create a static class called “Arg” with the following methods (to help out):

public static class Arg {
    public static void NotNull(object value, string paramName) {
        if (value == null) {
            throw new ArgumentNullException(paramName);
        }
    }
    public static void NotNullOrEmpty(string value, string paramName) {
        if (String.IsNullOrEmpty(value)) {
            // ED: Error_StringArgumentNullOrEmpty is a key in my Visual Studio
            //  project's default string resources file (Properties/Resources.resx)
            throw new ArgumentException(
                String.Format(CultureInfo.CurrentCulture, 
                              Resources.Error_StringArgumentNullOrEmpty, 
                              paramName), 
                paramName);
        }
    }
}

Then, I can use it like this

public void Foo(string arg, object reallyLongArgumentName) {
    Arg.NotNullOrEmpty(arg, "arg");
    Arg.NotNull(reallyLongArgumentName, "reallyLongArgumentName");
}

I was recently playing with lambda expressions in C# and thought of a way to make this a little cooler.  The result is, I can rewrite the lines above like this:

public void Foo(string arg, object reallyLongArgumentName) {
    Arg.NotNullOrEmpty(() => arg);
    Arg.NotNull(() => reallyLongArgumentName);
}

The first obvious advantage is that for longer argument names, its more concise.  The second, is that I no longer have to worry about updating the strings representing the parameter names when I rename arguments, the Visual Studio Refactoring engine will take care of it for me.

Obviously, this is slower than the previous method since I’m diving into the lambda expression at runtime, but its cool.  But if I need the performance, I have a little Regular Expression I can run in Visual Studio to convert the lambda calls back to the string versions (and vice-versa)

How does it work? Well, the core of the code is a static helper in the Arg class

private static string GetParameterName<T>(Expression<Func<T>> paramExpr) {
    LambdaExpression lambda = paramExpr as LambdaExpression;
    Debug.Assert(lambda != null);
    MemberExpression paramRef = lambda.Body as MemberExpression;
    Debug.Assert(paramRef != null);

    // Get the parameter name
    string paramName = paramRef.Member.Name;
    return paramName;
}

Lines 2-5 dive through the Expression tree to find the MemberExpression that represents the parameter (i.e. "foo" in () => foo). Then, we pull out the MemberInfo for the parameter and check the name. With that method, the actual checker is easy:

public static void NotNull(Expression<Func<object>> paramExpr) {
    string paramName = GetParameterName(paramExpr);

    // Compile the lambda (to get the value)
    Func<object> compiledLambda = paramExpr.Compile();
    
    // Run the contract check
    NotNull(compiledLambda(), paramName);
}

Line 2 extracts the parameter name. Line 5 compiles the lambda into an actual function that will return the value of the parameter. Finally, Line 8 uses the value and the parameter name to call my original object/string version. The code for NotNullOrEmpty is nearly identical.

Anyway, if you're concerned about performance, stick to the overloads which take the parameter name directly. I’ve attached the code as a TXT file, just rename to C#, change the namespace as appropriate and enjoy

Arg.txt (3.93 KB)

Compiler from Scratch – Part 3: Tokenization Overview

Here comes part 3 of my infinite part series on developing a compiler from scratch.  See all the posts here.

The first phase of compilation is called Tokenization.  Tokenization is the process of taking individual characters in the source code file and grouping them into more conceptual “Tokens.”  We’re not parsing the language, or checking syntax, we’re just grouping related characters together.  Why do we do this?  If we skipped this step, we’d be passing characters directly to our parser.  The parser doesn’t care that the text contains the characters ‘f’ ‘o’ ‘o’, it cares that the source file contains an Identifier called “foo”.  There’s also a separation of concerns issue.  In theory, if you wanted to allow accented Unicode characters like è, ê, ã, à, etc. (you may see blocks or ‘?’ characters instead of the real characters, if so just pretend they’re there :D) as valid characters in identifiers, you should only have to change the Tokenizer (to accept those characters and use them, along with the regular alphanumeric characters, to build identifier tokens).  The formal name for Tokenization is actually “Lexical Analysis,” and sometimes the Tokens are called “Atoms.”

So, how do we Tokenize text?  There are lots of tools out there, most of which let you define your tokens using Regular Expressions (so an identifier would be: “[_A-Za-z][_0-9A-Za-z]*”, one letter or underscore followed by zero or more letters, numbers or underscores).  These tools build state machines (aka “Finite Automata”) that basically walk through all the Regular Expressions until exactly one of them matches the current text buffer.  Tools like “lex” basically allow you to build a full-featured, efficient, Tokenizer by simply writing Regular Expressions.

However, that would be too easy for this series :).  Instead, I’m using a very simple, hand-coded, Tokenizer which walks through the characters, using “if” and “switch” statements to decide what tokens to output.  However, we can’t just simply iterate through the characters.  Consider the ”->” operator in Duh (see Part 1).  Duh supports the simple arithmetic operators, specifically “-“, and I plan to support inequality operators (i.e “<”, “>”, etc.), so in theory, the “->” operator could be outputted as a pair of tokens: MINUS and GREATERTHAN (I’ll use ALLCAPS to denote the names of tokens).  However, this would be putting a larger burden on the parser to detect the MINUS GREATERTHAN pattern.  Instead, we can take advantage of “lookahead”

The Tokenizer uses the .Net TextReader class, which lets us walk, one character at a time, through the characters in a body of text.  However, once we’ve called the Read method, the TextReader moves on to the next character and the next time we call it, we’ll get the next character.  So, in order to properly parse the “->” operator, we need a way to look at the next character, without moving the reader.  Fortunately, the TextReader also has the Peek method, which does exactly that.

So now, when we encounter a “-“ symbol, we look at the next character.  If it is a “>”, we output an ARROW token, and move to the next character.  Otherwise, we output a MINUS token and leave the reader at the “>” character.  Then, in the next iteration, we read the “>” and output a GREATERTHAN token.

Phew, that was a bit long, and more theoretical than I like, but I’ll post some code in the next part.  Next time, we’ll look at the actual code for the Tokenizer.

ASP.Net MVC 1.0 Release Candidate released

Phil Haack (program manager on the ASP.Net MVC team) just posted a blog post about the new Release Candidate of the MVC framework.  Since I got a chance to work on the MVC framework team as an intern this summer (and I’m going back full-time this summer), I’m pretty excited about the news.  When this goes RTM, it’ll be the first time code that I wrote will be part of a full shipping Microsoft product! 

So download the binaries from ASP.Net or check out the source code on CodePlex and have some fun!

Also, check out Scott Guthrie’s post on the new features.  He shows off one of the features that, I think, makes ASP.Net MVC more compelling than Ruby on Rails: Powerful first-party tooling support.

Compiler from Scratch – Part 2: Normalizing Newlines

Well, this is a good sign, I’m posting part 2 :).  As I said in Part 1, there are no promises here.  This series may not go anywhere, I may have to drop it due to my other commitments (school, work, etc.).

In the next few posts, I’m going to talk about the Tokenization phase of the compiler.  Before I go into too much detail on that though, I want to talk about newlines.

In Windows, a new line in a text file is indicated by a pair of characters: A Carriage Return (commonly referred to as “\r'”, as that is the C/C++ string escape sequence for it) followed by a Line Feed (”\n”).  However, in Unix-based operating systems, a new line is often indicated by a Line Feed character alone.  To add even more confusion, the Mac OS uses a Carriage Return character alone. 

A compiler needs to track line numbers accurately, in order to report errors, so we need be extra careful around newlines.  We could simply use the current operating system’s default newline characters, but that makes it difficult for multi-platform development.  Instead, we’ll normalize the newlines so that all three different types are properly understood by our compiler.

To do this, I’ve written a “Decorator” class which inherits from the abstract TextReader class provided in the .Net framework.  This “Newline Normalizing” decorator wraps an existing TextReader and does all the work of normalizing new line characters  TextReader provides two methods that need to be implemented, Read and PeekRead returns the current character from the text and moves the reader one character forwards (so that the next call to Read will return the next character).  Peek also returns the current character from the text but does not move the reader forward.  The “Newline Normalizing” reader implements these two methods using the following code

public override int Peek() {
    // Get the next character
    int i = Adaptee.Peek();
    
    // If the character is a '\r' newline, just return '\n'.  
    // Unlike Read, we aren't going to read ahead to check for \r\n
    // because that will happen when the user calls Read()
    if (i == (int)'\r') {
        i =  (int)'\n';
    }

    return i;
}

public override int Read() {
    // Get the next character
    int i = Adaptee.Read();

    // If the character is a '\r' newline, we're going to normalize it to '\n'
    // However, if the newline is '\r\n', we need to return it as one character, so
    // we check ahead for that
    if (i == (int)'\r') {
        if (Adaptee.Peek() == (int)'\n') {
            Adaptee.Read(); // Skip the '\n'
        }
        i = (int)'\n';
    }

    return i;
}

Essentially, if the character we read from the source (the “Adaptee” as I call it) is a ‘\n’, we just pass it along.  If the character is a ‘\r’, we are going to return ‘\n’, but first we first check to see if it is immediately followed by a ‘\n’.  If it is, we skip the extra character.  The result is that no matter which newline sequence is used, this TextReader will return it as a single ‘\n’ character.

Next post, I’ll start talking about the Tokenization process.

Writing a Compiler from Scratch – Part 1

EDIT: Whoops, wrote the wrong course number (CMPT 376), corrected below.  I’m taking CMPT 376, Writing for Computer Scientists, this semester, must have gotten myself mixed up :)

Ok, I’m going to try something here, and I don’t know if I’ll manage to finish it.  My all-time favorite class in my school career (so far) has been CMPT 379 – Compiler Design.  The class basically consisted of me writing a compiler in Visual C++, with weekly lectures and code reviews from the professor (it was a small class).  Since then, I’ve been trying to think of a reason to write a compiler (i.e. a language that can be ported to .Net, or a useful domain-specific language, etc.).  Then, this morning, as I walked back to the Computing Science common room from the Gym (after my morning workout), I found a reason.  I think compilers are some of the most interesting pieces of software, and I thought that if I wrote a series of blog posts in which I developed a compiler, maybe I could share that passion with others. 

So, here’s the plan:  I’ve designed an, extremely simple, language that I’m going to walk through writing a compiler for.  The initial version will target .Net IL code (since it’s a stack machine system, which is much easier to generate code for).  Once I’ve implemented the simple language, I think it’d be awesome if my readers could jump in and propose some ideas for new language features, and I’ll try to blog about implementing them.

To be honest, this project may well fall flat on its face.  Writing a compiler is not a simple task, no matter how simple the language.  However, I do think it will be fun (while it lasts), so let’s give it a try!

So, step one is: Define your language!  The most essential part of compiler design is having a very clear idea of the purpose of your language.  The language I’m going to design is called “Duh”, because it’s probably about as simple as it gets (maybe a little more complex than Brainf**k (NOTE: link target does include a few four-letter words :D)).  Here’s a sample Duh program

print "Enter your age: ";
ageInput : string;
age : int;
ageInput = readline;
age = ageInput -> int;
print "In 10 years you will be " + (age + 10) -> string + ". Wow, that's old"

Pretty simple, eh?  Line 1 is a simple print statement, which accepts a string and write it to the console.  Lines 2 and 3 declare variables of string and 32-bit integer types (decided to use a more Pascal-esque variable declaration format, just for fun).  Lines 4 and 5 assign values to those variables.  Line 4 uses another built-in function, readline, which reads a line of text from the console.  Note that we don’t even support initializing variables in the declaration!  That’s ok though, this is just a toy language, we can add initializations later.  Line 5 introduces the “->” conversion operator, which takes the value on the left and converts it to the type on the right.  It’s not quite the same as a cast, because it will try to convert the value, if it can.  Finally, we add 10 to the value the user entered and convert it to a string, then we place that string into a large message and print it to the console.

Well, here goes nothing!  Feel free to post your initial comments.  I hope you’ll follow along!  I’ll be posting full source code with every post

My First Windows 7 Blue-Screen – But it was my fault :P

I just had my first Windows 7 Blue Screen of Death :(.  However, it was my own fault for trying to use an unsupported driver :P.  I had just listened to an episode of Security Now on a tool called Sandboxie.  Sandboxie is a tool which intercepts Windows API calls made by programs to read/write files and registry keys.  Once intercepted, Sandboxie redirects them to files and registry keys in a special “Sandbox”, so that changes made by programs are isolated from each other.  It’s a cool security solution, and very similar in spirit to Microsoft’s App-V platform (only cheaper, and aimed at consumers rather than enterprises :D).

In order to intercept the Windows API calls, however, Sandboxie has to install a kernel-mode driver and patch the kernel.  In 64-bit versions of Windows, a system called PatchGuard prevents this from happening, thus Sandboxie is not compatible with those operating systems.  However, my laptop is running a 32-bit version of Windows 7, so I decided to try it out.

At first, I got a compatibility message from the Sandboxie installer, telling me that my OS is not supported.  That should have been my first clue :).  I decided to take a gamble and try it out anyway, so I tweaked the compatibility settings for the installer so that it ran in “Windows Vista” compatibility mode.  The installer ran fine, and installed the software.  However, when I tried to run it, BAM, BSOD :(. 

Resigning myself to the fact that it just wasn’t ready for Windows 7, I booted up in Safe Mode.  However, I was unable to run the installer again to remove it.  I tried “Add/Remove Programs” and running the installer I downloaded again (in Vista compatibility mode).  Still nothing.  Fortunately, I was just about to restart for Windows Update when I installed Sandboxie, and Windows Update automatically creates a System Restore point before installing updates.  I fired up System Restore, picked the Windows Update restore point and let it do its thing.  The machine rebooted, and I was back in action, with Sandboxie (and my BSODs) gone.  I had lost the updates that WU installed, but that’s a minor inconvenience.

Anyway, all is well now, and I was able to boot up again (in order to write this post in fact :D).  So, two lessons here:

  1. Use System Restore!  Just remember to make restore points before installing software that you are concerned about.  (Though that will NOT protect you from malicious software, just incompatible software)
  2. Check out Sandboxie, just not on Windows 7 :(.  My theory is that the PatchGuard technology from the 64-bit OS may have been brought into the 32-bit OS.

Windows 7 First Impressions

So, as an MSDN subscriber (no, I’m not made of money, Microsoft Interns get a free year-long subscription to MSDN for personal use :P), I had access to the Windows 7 public beta a day early.  I decided to go crazy, since I’ve been hearing its really stable, and put the latest OS on both my laptop and my desktop.  (Well, I actually put Windows 7 Server, aka Windows Server 2008 R2 on my desktop).  So, I figured I’d post my first impressions.

Installation

There’s not much to say here, Installation is exactly like Windows Vista, only a little faster.  The only new feature is that Windows 7 Setup prompts you to create a HomeGroup, if you want.  HomeGroups are the new networking construct introduced in Windows 7 designed to make it easier to share files and devices between networked computers.  I haven’t had a change to check that out yet, so I’ll come back to it later.

Initial Impression

Besides a stylish new boot screen, in which four coloured dots dance around before combining to form the Windows logo, the boot process is also identical to Windows Vista.  I did find that it booted up much faster than Vista (though I can’t make an accurate comparison, since my laptop was getting a bit overloaded).  The new taskbar is very cool, and while it is a bit of a knock-off of the OSX Dock, I think Microsoft has (in typical Microsoft fashion) gone above and beyond the OSX experience.  For example, by hovering the mouse over an icon, a list of all the windows belonging to that application appears.  Even better, applications which directly support Windows 7 can add their own “windows” to this list.  For example, even though I only have one IE8 window open, each tab in that window appears as a separate item in the windows list.

Taskbar Windows List

By hovering over each thumbnail, that window is brought to focus on the screen, and the rest of the windows become “glass”.

Peeking at a Window

(And yes, I did blank out my Windows Messenger buddies list :P).

Jump lists are another cool feature, but I haven’t had a chance to explore it much.  Essentially, when you right click, or click and drag up on one of these taskbar icons, a jump list appears.

IE8 Jumplist

In this case (Internet Explorer 8), my history is displayed.  Applications designed for Windows 7, get a lot of control over this list, but applications which are not designed to support it (PowerShell 2.0 for example) just get a simple default list

PowerShell 2.0 Jumplist

I haven’t had much of a chance to explore the rest of the new stuff, so I’ll post more later, but my initial impression is that Windows 7 is just plain awesome :).

Redirecting your Zune Library to another folder/drive

As I mentioned in an earlier post about backing up a Zune Library (Backup a Zune Library...), I wanted to find a way to be able to actually store the Zune library database on a different drive.  I played with the Microsoft Sysinternals “Junction” tool, which lets you create hardlinks (Wikipedia on Hardlinks) and managed to get it to work.  Here’s the process to do it yourself.  Replace UserName with your Windows user name and D:\Music\Zune\Library with the path to the folder you want to store the database in.

  1. Download Junction.exe from here: http://technet.microsoft.com/en-us/sysinternals/bb896768.aspx (replace D:\Utils\Sysinternals\junction.exe below with the path you downloaded junction.exe to)
  2. Shut down the Zune software if it is running.
  3. Rename the folder C:\Users\UserName\AppData\Local\Microsoft\Zune to Zune_backup (just to be safe).  You may encounter a file locking issue, but I found that if I waited a few minutes it worked.
  4. Copy the Zune_backup folder to D:\Music\Zune and rename it to Library
  5. Run the following command:
    D:\Utils\Sysinternals\junction.exe Zune D:\Music\Zune\Library

That should do it!  Start Zune up again and your library should be up and running.  Now you should be able to reinstall the OS without having to back up the Zune library.  I might try to whip up a PowerShell or Batch file to do this, but don’t get too hopeful :).

Mappr: Projecting Geographical Points on the Screen

One of the courses I took last semester (Fall 2008), was “Software Engineering II”.  In this class, we were required to work in groups to implement a project that the professor specified.  We had to go through the whole process, design, implementation, testing (though we could choose any software process model we wanted: Waterfall, XP, Agile, etc.).  Our group’s project was an application called “Mappr” that would allow users to browse a map.  Well, it was a little more than that, but that’s all the background required by this post, I’ll post more background in future posts.

One of the necessary components in mapping software is called a “Projection”.  The Earth is round, and Latitude and Longitude co-ordinates are spherical measurements representing points on the Earth.  In order to convert those co-ordinates to (x, y) co-ordinates for displaying on a (flat) computer screen, you must project the geographical co-ordinates into screen co-ordinates.  One well-known technique for doing this is called the Mercator Projection.

A quick aside: The Mercator Projection is widely known (in geography circles) for being highly inaccurate.  However, it is the projection used by most road maps, atlases, etc., both physical and digital.

In Mappr, projection is handled by a component called a “Projection Strategy”.  A Projection Strategy is a C# class (Mappr was written in C#) with two methods: GeoToScreen and ScreenToGeo.   Here are the signatures of those methods:

public interface IProjectionStrategy {
    Point GeoToScreen(Point geographicalPoint, int zoomLevel, int tileSize);
    Point ScreenToGeo(Point screenPoint, int zoomLevel, int tileSize);
}

The purpose of each method is straight forward: To take in either Geographical (Latitude, Longitude) co-ordinates or Screen (X, Y) co-ordinates, and convert them to the other.  In order to do this, we must know the Zoom Level, which is an integer N indicating that there are 2N tiles on the screen.  We also need the size, in pixels, of each map tile image.  This means that the size of the map, in pixels, is given by: 2zoomLevel * tileSize.

The code to project a geographical point on to the screen is shown below:

public Point GeoToScreen(Point geographicalPoint, int zoomLevel, int tileSize) {
    // Convert to normalized mercator
    double lon = geographicalPoint.X;
    double lat = geographicalPoint.Y;

    if (lon > 180) {
        lon -= 360;
    }

    lon /= 360;
    lon += 0.5;

    lat = 0.5 - ((Math.Log(Math.Tan((Math.PI / 4) + 
                 ((0.5 * Math.PI * lat) / 180))) / Math.PI) / 2.0);

    double scale = (1 << zoomLevel) * tileSize;
    return new Point(lon * scale, lat * scale);
}

This code first normalizes the longitude (X direction) so that it is in the range 0.0 to 1.0 (where 0.0 is the left of the map and 1.0 is the right).  Then it does what I like to call “mathy stuff” (the calculations are taken from similar code written in Java) with the latitude to put it in the same range (0.0 is the top, 1.0 is the bottom).  Finally, we calculate the scale of the map (height/width in pixels, since the map is technically a square) and then we can use the normalized longitude and latitude as ratios of that scale.

The ScreenToGeo method is similar, the code is below.  I won’t describe this, but just provide it for reference.

public Point ScreenToGeo(Point screenPoint, int zoomLevel, int tileSize) {
    int pixelSpan = (1 << zoomLevel) * tileSize;
    double lngWidth = 360.0 / pixelSpan; // width in degrees longitude
    double lng = -180 + (screenPoint.X * lngWidth); // left edge in degrees longitude

    double latHeightMerc = 1.0 / pixelSpan; // height in "normalized" mercator 0,0 top left
    double latMerc = screenPoint.Y * latHeightMerc; // top edge in "normalized" mercator 0,0 top left
    
    // convert top and bottom lat in mercator to degrees
    // note that in fact the coordinates go from about -85 to +85 not -90 to 90!
    double lat = (180 / Math.PI) * ((2 * Math.Atan(Math.Exp(Math.PI * (1 - (2 * latMerc)))))
                       - (Math.PI / 2));

    return new Point(lng, lat);
}

By the way, feel free to use any of the code in this post in your own application. Consider it "Public Domain". However, I would appreciate (but not require) if you would place a comment near it indicating that this blog is the source of the original code.

Hopefully this helps those of you writing mapping applications in C#!  Please post any questions or comments in the comments section!

Writing a blog for fun and profit… I mean grades

The Spring semester at SFU is off to a running start (after a day lost to the snow), and I’ve already got some homework to do!  One of the classes I’m taking is a course titled “Writing for Computer Scientists”. The goal is to develop writing skills through practice. Lots and lots of practice :).

Each class (3 times a week), we’re required to submit some “writing”.  I use quotes simply to emphasize the freedom of this assignment.  All I have to do is write a single word, and turn it in each week.  Of course, that’s not really the point of the assignment ;).  One option, provided by the professor, is simply to start a blog (if you don’t already have one) and post 2-3 entries per week.  Given that I already have a blog which I have had trouble maintaining, I figure this is a good kick in the behind to get myself blogging again. 

So, I’ll hopefully be blogging more often (at least while my grades depend on it :D).  I’ll probably stick to the current subject of the blog: observations from an open-source developer on the Microsoft ASP.Net platform.  Just in case I start to wander to different topics, I’ll make sure to tag appropriately, so my normal readers can subscribe just to the posts they’re interested in.

It’s time to start blogging for fun and profit grades!