hand coding a parser

Posted by John Leidegren on Stack Overflow See other posts from Stack Overflow or by John Leidegren
Published on 2010-04-13T19:40:12Z Indexed on 2010/04/13 19:43 UTC
Read the original article Hit count: 426

Filed under:

For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don't tell me to read the dragon book, i'll come around to that eventually.

I wanna get into the gritty details about implementing a lexer and parser for a reasonable simple langauge, say CSS. And I wanna do this right.

This will probably end up being a series of questions but right now I'm starting with a lexer. Tokenization rules for CSS can be found here.

I find my self writing code like this (hopefully you can infer the rest from this snippet):

public CssToken ReadNext()
{
    int val;
    while ((val = _reader.Read()) != -1)
    {
        var c = (char)val;
        switch (_stack.Top)
        {
            case ParserState.Init:
                if (c == ' ')
                {
                    continue; // ignore
                }
                else if (c == '.')
                {
                    _stack.Transition(ParserState.SubIdent, ParserState.Init);
                }
                break;

            case ParserState.SubIdent:
                if (c == '-')
                {
                    _token.Append(c);
                }
                _stack.Transition(ParserState.SubNMBegin);
                break;

What is this called? and how far off am I from something reasonable well understood? I'm trying to balence something which is fair in terms of efficiency and easy to work with, using a stack to implement some kind of state machine is working quite well, but I'm unsure how to continue like this.

What I have is an input stream, from which I can read 1 character at a time. I don't do any look a head right now, I just read the character then depending on the current state try to do something with that.

I'd really like to get into the mind set of writing reusable snippets of code. This Transition method is currently means to do that, it will pop the current state of the stack and then push the arguments in reverse order. That way, when I write Transition(ParserState.SubIdent, ParserState.Init) it will "call" a sub routine SubIdent which will, when complete, return to the Init state.

The parser will be implemented in much the same way, currently, having everyhing in a single big method like this allows me to easily return a token when I found one, but it also forces me to keep everything in one single big method. Is there a nice way to split these tokenization rules into seperate methods?

Any input/advice on the matter would be greatly appriciated!

Developer IT

hand coding a parser - Developer IT

hand coding a parser

c#

lexer

parser

compiler

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

HttpWebRequest and Ignoring SSL Certificate Errors

The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

Dynamic Type to do away with Reflection

Finding a Relative Path in .NET

Related posts about lexer

Error in running script [closed]

problem string recursion antlr lexer token

Lexer antlr3 token problem

grammar parser lexer antlr letteral

ANTLR lexer mismatches tokens

Categories cloud