Tokenizing numbers for a parser

Posted by René Nyffenegger on Stack Overflow See other posts from Stack Overflow or by René Nyffenegger
Published on 2010-06-11T12:06:35Z Indexed on 2010/06/11 12:12 UTC
Read the original article Hit count: 300

Filed under:

parser

|

tokenizing

I am writing my first parser and have a few questions conerning the tokenizer.

Basically, my tokenizer exposes a nextToken() function that is supposed to return the next token. These tokens are distinguished by a token-type. I think it would make sense to have the following token-types:

SYMBOL (such as <, :=, ( and the like
REMARK (or a comment)
NUMBER
IDENT (such as the name of a function or a variable)
STRING (Something enclosed between "....")

Now, do you think this makes sense?

Also, I am struggling with the NUMBER token-type. Do you think it makes more sense to further split it up into a NUMBER and a FLOAT token-type? Without a FLOAT token-type, I'd receive NUMBER (eg 402), a SYMBOL (.) followed by another NUMBER (eg 203) if I were about to parse a float.

Finally, what do you think makes more sense for the tokenizer to return when it encounters a -909? Should it return the SYMBOL - first, followed by the NUMBER 909 or should it return a NUMBER -909 right away?

© Stack Overflow or respective owner

Related posts about parser

Core Data error when assigning variable with one-to-one relationship

as seen on Stack Overflow - Search for 'Stack Overflow'
I tried to assign a managed object (C) with its property another managed object (B) (a one-to-one relationship) in which this other managed object (B) has a to-many relationship with one other managed object (A). There is an error from this assignment in which I copied as follows: #0 0x020e53a7… >>> More
RapidXML - does not compile ?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am novice to rapidXML but first impresion was not positive, I made simple Visual Studio 6 C++ Hello World Application and added RapidXML hpp files to project and in main.cpp I put: #include "stdafx.h" #include < iostream > #include < string > #include "rapidxml.hpp" using namespace… >>> More
exception occured in java compiler

as seen on Stack Overflow - Search for 'Stack Overflow'
I am a beginner in Java.I have JDK1.7.0 installed on windows 7 OS.I just wrote a sample java file where the file was not getting compiled and throws the below error. Sam.java:5: ';' expected Sample p = New Sample(); An exception has occurred… >>> More
Doxygen C++ comment string parser in python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Does anybody know of a python module to parse a doxygen style C++ comment string? I mean a string like this (simple example): /** * A constructor. * A more elaborate description of the constructor. * @param param1 test1 * @param param2 test2 */ and I would like to extract the brief… >>> More
Coding a parser for a domain specific language in Java

as seen on Stack Overflow - Search for 'Stack Overflow'
We want to design a simple domain specific language for writing test scripts to automatically test a XML-based interface of one of our applications. A sample test would be: Get an input XML file from network shared folder or subversion repository Import the XML file using the interface Check if… >>> More

Related posts about tokenizing

Tokenizing Twitter Posts in Lucene

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, My question in a nutshell: Does anyone know of a TwitterAnalyzer or TwitterTokenizer for Lucene? More detailed version: I want to index a number of tweets in Lucene and keep the terms like @user or #hashtag intact. StandardTokenizer does not work because it discards the punctuation (but… >>> More
Parsing/Tokenizing a String Containing a SQL Command

as seen on Stack Overflow - Search for 'Stack Overflow'
Are there any open source libraries (any language, python/PHP preferred) that will tokenize/parse an ANSI SQL string into its various components? That is, if I had the following string SELECT a.foo, b.baz, a.bar FROM TABLE_A a LEFT JOIN TABLE_B b ON a.id = b.id WHERE baz = 'snafu'; I'd get… >>> More
How add default value JQuery Tokenizing Autocomplete ?

as seen on Stack Overflow - Search for 'Stack Overflow'
I am using jquery autocomplete. I want, add default value when loaded page. I added input value tag. But, did not work :( How can I do ? >>> More
StringTokenizer problem of tokenizing

as seen on Stack Overflow - Search for 'Stack Overflow'
String a ="the STRING TOKENIZER CLASS ALLOWS an APPLICATION to BREAK a STRING into TOKENS. "; StringTokenizer st = new StringTokenizer(a); while (st.hasMoreTokens()){ System.out.println(st.nextToken()); Given above codes, the output is following, the STRING TOKENIZER CLASS ALLOWS an APPLICATION to BREAK a STRING into TOKENS… >>> More
Performance of tokenizing CSS in PHP

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a noob question from someone who hasn't written a parser/lexer ever before. I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft). It's a list of 21 possible tokens, that… >>> More