Is it possible to create a single tokenizer to parse this?

Posted by Adrian on Programmers See other posts from Programmers or by Adrian
Published on 2013-05-16T16:22:47Z Indexed on 2013/10/29 10:18 UTC
Read the original article Hit count: 433

Filed under:

parser

|

lexer

This extends off this other Q&A thread, but is going into details that are out of scope from the original question.

I am generating a parser that is to parse a context-sensitive grammar which can take in the following subset of symbols: ,, [, ], {, }, m/[a-zA-Z_][a-zA-Z_0-9]*/, m/[0-9]+/

The grammar can take in the following string { abc[1] }, } and parse it as ({, abc[1], }, }). Another example would be to take: { abc[1] [, } and parse it as ({, abc[1], [,, }).

This is similar to the grammar used in Perl for the qw() syntax. The braces indicate that the contents are to be whitespace tokenized. A closing brace must be on its own to indicate the end of the whitespace tokenized group. Can this be done using a single lexer/tokenizer, or would it be necessary to have a separate tokenizer when parsing this group?

© Programmers or respective owner

Related posts about parser

Core Data error when assigning variable with one-to-one relationship

as seen on Stack Overflow - Search for 'Stack Overflow'
I tried to assign a managed object (C) with its property another managed object (B) (a one-to-one relationship) in which this other managed object (B) has a to-many relationship with one other managed object (A). There is an error from this assignment in which I copied as follows: #0 0x020e53a7… >>> More
RapidXML - does not compile ?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am novice to rapidXML but first impresion was not positive, I made simple Visual Studio 6 C++ Hello World Application and added RapidXML hpp files to project and in main.cpp I put: #include "stdafx.h" #include < iostream > #include < string > #include "rapidxml.hpp" using namespace… >>> More
exception occured in java compiler

as seen on Stack Overflow - Search for 'Stack Overflow'
I am a beginner in Java.I have JDK1.7.0 installed on windows 7 OS.I just wrote a sample java file where the file was not getting compiled and throws the below error. Sam.java:5: ';' expected Sample p = New Sample(); An exception has occurred… >>> More
Doxygen C++ comment string parser in python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Does anybody know of a python module to parse a doxygen style C++ comment string? I mean a string like this (simple example): /** * A constructor. * A more elaborate description of the constructor. * @param param1 test1 * @param param2 test2 */ and I would like to extract the brief… >>> More
Coding a parser for a domain specific language in Java

as seen on Stack Overflow - Search for 'Stack Overflow'
We want to design a simple domain specific language for writing test scripts to automatically test a XML-based interface of one of our applications. A sample test would be: Get an input XML file from network shared folder or subversion repository Import the XML file using the interface Check if… >>> More

Related posts about lexer

Error in running script [closed]

as seen on Programmers - Search for 'Programmers'
I'm trying to run heathusf_v1.1.0.tar.gz found here I installed tcsh to make build_heathusf work. But, when I run ./build_heathusf, I get the following (I'm running that on a Fedora Linux system from Terminal): $ ./build_heathusf Compiling programs to build a library of image processing functions… >>> More
problem string recursion antlr lexer token

as seen on Stack Overflow - Search for 'Stack Overflow'
How do I build a token in lexer that can handle recursion inside as this string: ${*anythink*${*anything*}*anythink*} ? thanks >>> More
Lexer antlr3 token problem

as seen on Stack Overflow - Search for 'Stack Overflow'
Can I construct a token ENDPLUS: '+' (options (greedy = false;):.) * '+' ; being considered by the lexer only if it is preceded by a token PREwithout including in ENDPLUS? PRE: '<<' ; Thanks. >>> More
grammar parser lexer antlr letteral

as seen on Stack Overflow - Search for 'Stack Overflow'
What's the difference between this grammar: ... if_statement : 'if' condition 'then' statement 'else' statement 'end_if'; ... and this: ... if_statement : IF condition THEN statement ELSE statement END_IF; ... IF : 'if'; THEN: 'then'; ELSE: 'else'; END_IF: 'end_if'; .... ? If there is any… >>> More
ANTLR lexer mismatches tokens

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a simple ANTLR grammar, which I have stripped down to its bare essentials to demonstrate this problem I'm having. I am using ANTLRworks 1.3.1. grammar sample; assignment : IDENT ':=' NUM ';' ; IDENT : ('a'..'z')+ ; NUM : ('0'..'9')+ ; WS : (' '|'\n'|'\t'|'\r')+… >>> More