Poor man's "lexer" for C#

Posted by Paul Hollingsworth on Stack Overflow See other posts from Stack Overflow or by Paul Hollingsworth
Published on 2009-03-23T11:57:23Z Indexed on 2010/05/21 22:00 UTC
Read the original article Hit count: 254

Filed under:
|
|

I'm trying to write a very simple parser in C#.

I need a lexer -- something that lets me associate regular expressions with tokens, so it reads in regexs and gives me back symbols.

It seems like I ought to be able to use Regex to do the actual heavy lifting, but I can't see an easy way to do it. For one thing, Regex only seems to work on strings, not streams (why is that!?!?).

Basically, I want an implementation of the following interface:

interface ILexer : IDisposable
{
    /// <summary>
    /// Return true if there are more tokens to read
    /// </summary>
    bool HasMoreTokens { get; }
    /// <summary>
    /// The actual contents that matched the token
    /// </summary>
    string TokenContents { get; }
    /// <summary>
    /// The particular token in "tokenDefinitions" that was matched (e.g. "STRING", "NUMBER", "OPEN PARENS", "CLOSE PARENS"
    /// </summary>
    object Token { get; }
    /// <summary>
    /// Move to the next token
    /// </summary>
    void Next();
}

interface ILexerFactory
{
    /// <summary>
    /// Create a Lexer for converting a stream of characters into tokens
    /// </summary>
    /// <param name="reader">TextReader that supplies the underlying stream</param>
    /// <param name="tokenDefinitions">A dictionary from regular expressions to their "token identifers"</param>
    /// <returns>The lexer</returns>
    ILexer CreateLexer(TextReader reader, IDictionary<string, object> tokenDefinitions);
}

So, pluz send the codz...
No, seriously, I am about to start writing an implementation of the above interface yet I find it hard to believe that there isn't some simple way of doing this in .NET (2.0) already.

So, any suggestions for a simple way to do the above? (Also, I don't want any "code generators". Performance is not important for this thing and I don't want to introduce any complexity into the build process.)

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex