Tokenizing numbers for a parser
Posted
by René Nyffenegger
on Stack Overflow
See other posts from Stack Overflow
or by René Nyffenegger
Published on 2010-06-11T12:06:35Z
Indexed on
2010/06/11
12:12 UTC
Read the original article
Hit count: 293
parser
|tokenizing
I am writing my first parser and have a few questions conerning the tokenizer.
Basically, my tokenizer exposes a nextToken() function that is supposed to return the next token. These tokens are distinguished by a token-type. I think it would make sense to have the following token-types:
- SYMBOL (such as
<,:=,(and the like - REMARK (or a comment)
- NUMBER
- IDENT (such as the name of a function or a variable)
- STRING (Something enclosed between "....")
Now, do you think this makes sense?
Also, I am struggling with the NUMBER token-type. Do you think it makes more sense to further split it up into a NUMBER and a FLOAT token-type? Without a FLOAT token-type, I'd receive NUMBER (eg 402), a SYMBOL (.) followed by another NUMBER (eg 203) if I were about to parse a float.
Finally, what do you think makes more sense for the tokenizer to return when it encounters a -909? Should it return the SYMBOL - first, followed by the NUMBER 909 or should it return a NUMBER -909 right away?
© Stack Overflow or respective owner