Search Results

Search found 126 results on 6 pages for 'lexer'.

Page 2/6 | < Previous Page | 1 2 3 4 5 6  | Next Page >

  • hand coding a parser

    - by John Leidegren
    For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don't tell me to read the dragon book, i'll come around to that eventually. I wanna get into the gritty details about implementing a lexer and parser for a reasonable simple langauge, say CSS. And I wanna do this right. This will probably end up being a series of questions but right now I'm starting with a lexer. Tokenization rules for CSS can be found here. I find my self writing code like this (hopefully you can infer the rest from this snippet): public CssToken ReadNext() { int val; while ((val = _reader.Read()) != -1) { var c = (char)val; switch (_stack.Top) { case ParserState.Init: if (c == ' ') { continue; // ignore } else if (c == '.') { _stack.Transition(ParserState.SubIdent, ParserState.Init); } break; case ParserState.SubIdent: if (c == '-') { _token.Append(c); } _stack.Transition(ParserState.SubNMBegin); break; What is this called? and how far off am I from something reasonable well understood? I'm trying to balence something which is fair in terms of efficiency and easy to work with, using a stack to implement some kind of state machine is working quite well, but I'm unsure how to continue like this. What I have is an input stream, from which I can read 1 character at a time. I don't do any look a head right now, I just read the character then depending on the current state try to do something with that. I'd really like to get into the mind set of writing reusable snippets of code. This Transition method is currently means to do that, it will pop the current state of the stack and then push the arguments in reverse order. That way, when I write Transition(ParserState.SubIdent, ParserState.Init) it will "call" a sub routine SubIdent which will, when complete, return to the Init state. The parser will be implemented in much the same way, currently, having everyhing in a single big method like this allows me to easily return a token when I found one, but it also forces me to keep everything in one single big method. Is there a nice way to split these tokenization rules into seperate methods? Any input/advice on the matter would be greatly appriciated!

    Read the article

  • Parser generator for JavaME

    - by Anders K.
    First: I have looked at this SO question but unfortunately there is no mention of JavaME I am looking for a parser/lexer generator that produces code that can run on the Blackberry and its (obnoxious) JavaME. E.g. at first I thought I could use ANTLR however it seems the run-time library is not compatible with JavaME TIA

    Read the article

  • GNU Flex, multiline rule

    - by Simone Margaritelli
    Hi there i have a flex rule inside my lexer definition : operators "[]"|"[]="|"[]<"|".."|"."|".="|"+"|"+="|"-"|"-="|"/"|"/="|"*"|"*="|"%"|"%="|"++"|"--"|"^"|"^="|"~"|"&"|"&="|"|"|"|="|"<<"|"<<="|">>"|"!"|"<"|">"|">="|"<="|"=="|"!="|"&&"|"||"|"~=" Is there any way to split this ruole on more lines to keep it clearer? I tried with \ just like macros but it does not seem to be accepted by flex :( PS: I don't want to split the rule in more sub-rules, but only split its regex in more lines to keep the code clearer.

    Read the article

  • Lexing newlines in scala StdLexical?

    - by Nick Fortescue
    I'm trying to lex (then parse) a C like language. In C there are preprocessor directives where line breaks are significant, then the actual code where they are just whitespace. One way of doing this would be do a two pass process like early C compilers - have a separate preprocessor for the # directives, then lex the output of that. However, I wondered if it was possible to do it in a single lexer. I'm pretty happy with writing the scala parser-combinator code, but I'm not so sure of how StdLexical handles whitespace. Could someone write some simple sample code which say could lex a #include line (using the newline) and some trivial code (ignoring the newline)? Or is this not possible, and it is better to go with the 2-pass appproach?

    Read the article

  • Call methods on native Javascript types without wrapping with ()

    - by Anurag
    In Javascript, we can call methods on string literals directly without enclosing it within round brackets. But not for other types such as numbers, or functions. It is a syntax error, but is there a reason as to why the Javascript lexer needs these other types to be enclosed in round brackets? For example, if we extend Number, String, and Function with an alert method and try calling this method on the literals, it's a SyntaxError for Number and Function, while it works for a String. function alertValue() { alert(this); } Number.prototype.alert = alertValue; String.prototype.alert = alertValue; Function.prototype.alert = alertValue; We can call alert directly on a string object: "someStringLiteral".alert() // alerts someStringLiteral but it's a SyntaxError on numbers, and functions. 7.alert(); function() {}.alert(); To work with these types, we have to enclose it within brackets: (7).alert(); // alerts "7" (function() {}).alert(); // alerts "function() {}"

    Read the article

  • lexers / parsers for (un) structured text documents

    - by wilson32
    There are lots of parsers and lexers for scripts (i.e. structured computer languages). But I'm looking for one which can break a (almost) non-structured text document into larger sections e.g. chapters, paragraphs, etc. It's relatively easy for a person to identify them: where the Table of Contents, acknowledgements, or where the main body starts and it is possible to build rule based systems to identify some of these (such as paragraphs). I don't expect it to be perfect, but does any one know of such a broad 'block based' lexer / parser? Or could you point me in the direction of literature which may help?

    Read the article

  • Optimising ruby regexp -- lots of match groups

    - by Farcaller
    I'm working on a ruby baser lexer. To improve performance, I joined up all tokens' regexps into one big regexp with match group names. The resulting regexp looks like: /\A(?<__anonymous_-1038694222803470993>(?-mix:\n+))|\A(?<__anonymous_-1394418499721420065>(?-mix:\/\/[\A\n]*))|\A(?<__anonymous_3077187815313752157>(?-mix:include\s+"[\A"]+"))|\A(?<LET>(?-mix:let\s))|\A(?<IN>(?-mix:in\s))|\A(?<CLASS>(?-mix:class\s))|\A(?<DEF>(?-mix:def\s))|\A(?<DEFM>(?-mix:defm\s))|\A(?<MULTICLASS>(?-mix:multiclass\s))|\A(?<FUNCNAME>(?-mix:![a-zA-Z_][a-zA-Z0-9_]*))|\A(?<ID>(?-mix:[a-zA-Z_][a-zA-Z0-9_]*))|\A(?<STRING>(?-mix:"[\A"]*"))|\A(?<NUMBER>(?-mix:[0-9]+))/ I'm matching it to my string producing a MatchData where exactly one token is parsed: bigregex =~ "\n ... garbage" puts $~.inspect Which outputs #<MatchData "\n" __anonymous_-1038694222803470993:"\n" __anonymous_-1394418499721420065:nil __anonymous_3077187815313752157:nil LET:nil IN:nil CLASS:nil DEF:nil DEFM:nil MULTICLASS:nil FUNCNAME:nil ID:nil STRING:nil NUMBER:nil> So, the regex actually matched the "\n" part. Now, I need to figure the match group where it belongs (it's clearly visible from #inspect output that it's _anonymous-1038694222803470993, but I need to get it programmatically). I could not find any option other than iterating over #names: m.names.each do |n| if m[n] type = n.to_sym resolved_type = (n.start_with?('__anonymous_') ? nil : type) val = m[n] break end end which verifies that the match group did have a match. The problem here is that it's slow (I spend about 10% of time in the loop; also 8% grabbing the @input[@pos..-1] to make sure that \A works as expected to match start of string (I do not discard input, just shift the @pos in it). You can check the full code at GH repo. Any ideas on how to make it at least a bit faster? Is there any option to figure the "successful" match group easier?

    Read the article

  • Performance of tokenizing CSS in PHP

    - by Boldewyn
    This is a noob question from someone who hasn't written a parser/lexer ever before. I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft). It's a list of 21 possible tokens, that all (but two) cannot be represented as static strings. My current approach is to loop through an array containing the 21 patterns over and over again, do an if (preg_match()) and reduce the source string match by match. In principle this works really good. However, for a 1000 lines CSS string this takes something between 2 and 8 seconds, which is too much for my project. Now I'm banging my head how other parsers tokenize and parse CSS in fractions of seconds. OK, C is always faster than PHP, but nonetheless, are there any obvious D'Oh! s that I fell into? I made some optimizations, like checking for '@', '#' or '"' as the first char of the remaining string and applying only the relevant regexp then, but this hadn't brought any great performance boosts. My code (snippet) so far: $TOKENS = array( 'IDENT' => '...regexp...', 'ATKEYWORD' => '@...regexp...', 'String' => '"...regexp..."|\'...regexp...\'', //... ); $string = '...CSS source string...'; $stream = array(); // we reduce $string token by token while ($string != '') { $string = ltrim($string, " \t\r\n\f"); // unconsumed whitespace at the // start is insignificant but doing a trim reduces exec time by 25% $matches = array(); // loop through all possible tokens foreach ($TOKENS as $t => $p) { // The '&' is used as delimiter, because it isn't used anywhere in // the token regexps if (preg_match('&^'.$p.'&Su', $string, $matches)) { $stream[] = array($t, $matches[0]); $string = substr($string, strlen($matches[0])); // Yay! We found one that matches! continue 2; } } // if we come here, we have a syntax error and handle it somehow } // result: an array $stream consisting of arrays with // 0 => type of token // 1 => token content

    Read the article

  • Unable to compile output of lex

    - by dbarker
    When I attempt to compile the output of this trivial lex program: # lex.l integer printf("found keyword INT"); using: $ gcc lex.yy.c I get: Undefined symbols: "_yywrap", referenced from: _yylex in ccMsRtp7.o _input in ccMsRtp7.o "_main", referenced from: start in crt1.10.6.o ld: symbol(s) not found collect2: ld returned 1 exit status lex --version tells me I'm actually using 'flex 2.5.35' although ls -fla `which lex` isn't a symlink. Any ideas why the output won't compile?

    Read the article

  • lexers vs parsers

    - by Naveen
    Are lexers and parsers really that different in theory ? It seems fashionable to hate regular expressions: coding horror, another blog post. However, popular lexing based tools: pygments, geshi, or prettify, all use regular expressions. They seem to lex anything... When is lexing enough, when do you need EBNF ? Has anyone used the tokens produced by these lexers with bison or antlr parser generators?

    Read the article

  • Lexing partial SQL in C#

    - by Chris T
    I'd need to parse partial SQL queries (it's for a SQL injection auditing tool). For example '1' AND 1=1-- Should break down into tokens like [0] => [SQL_STRING, '1'] [1] => [SQL_AND] [2] => [SQL_INT, 1] [3] => [SQL_AND] [4] => [SQL_INT, 1] [5] => [SQL_COMMENT] [6] => [SQL_QUERY_END] Are their any at least lexers for SQL that I base mine off of or any good tools like bison for C# (though I'd rather not write my own grammar as I need to support most if not all the grammar of MySQL 5)

    Read the article

  • lexical analysis gives only one output?

    - by Caffè
    I tested this example(lexe.java), but it gave me only one output. I gave this text as a reader: public class LexeTest{ private int a = 14; } And the nextToken() function is : public Category nextToken () { if (inp.findWithinHorizon (tokenPat, 0) == null) return Category.EOF; else { lastLexeme = inp.match ().group (0); if (inp.match ().start (1) != -1) return nextToken (); else if (inp.match ().start (2) != -1) return Category.IDENT; else if (inp.match ().start (3) != -1) return Category.NUMERAL; Category result = tokenMap.get (lastLexeme); if (result == null) return Category.ERROR; else return result; } } Isdie the main method: System.out.println(lexeObject.nextToken()); output is : IDENT Why? but the textfile contains multiple keywords? Anyone know what's the problem?

    Read the article

  • Simple XML parser in bison/flex

    - by user360872
    I would like to create simple xml parser using bison/flex. I don't need validation, comments, arguments, only <tag>value</tag>, where value can be number, string or other <tag>value</tag>. So for example: <div> <mul> <num>20</num> <add> <num>1</num> <num>5</num> </add> </mul> <id>test</id> </div> If it helps, I know the names of all tags that may occur. I know how many sub-tag can be hold by given tag. Is it possible to create bison parser that would do something like that: - new Tag("num", 1) // tag1 - new Tag("num", 5) // tag2 - new Tag("add", tag1, tag2) // tag3 - new Tag("num", 20) // tag4 - new Tag("mul", tag4, tag3) ... - root = top_tag Tag & number of sub-tags: num: 1 (only value) str: 1 (only value) add | sub | mul | div: 2 (num | str | tag, num | str | tag) Could you help me with grammar to be able to create AST like given above?

    Read the article

  • Postfix and right-associative operators in LR(0) parsers

    - by Ian
    Is it possible to construct an LR(0) parser that could parse a language with both prefix and postfix operators? For example, if I had a grammar with the + (addition) and ! (factorial) operators with the usual precedence then 1+3! should be 1 + 3! = 1 + 6 = 7, but surely if the parser were LR(0) then when it had 1+3 on the stack it would reduce rather than shift? Also, do right associative operators pose a problem? For example, 2^3^4 should be 2^(3^4) but again, when the parser have 2^3 on the stack how would it know to reduce or shift? If this isn't possible is there still a way to use an LR(0) parser, possibly by converting the input into Polish or Reverse Polish notation or adding brackets in the appropriate places? Would this be done before, during or after the lexing stage?

    Read the article

  • Parsing C# code to evaluate expressions (basically, implementing Intellisense)

    - by halivingston
    I'm trying to evaluate C# code as it gets typed, think of it as if I'm trying to write an IDE. So a person types code, I want to find out what code did he just write: String x = ""; I want to now register that x is a type of String. And now everytime the user types x again, and I want to show him all the things he can do with x, basically like Visual Studio Intellisense. Will I need some lexers or parsers for this? Will that make it easier? I've heard VS 2010 has some features around this that Microsoft has released. Any ideas?

    Read the article

  • Nested parsers in happy / infinite loop?

    - by McManiaC
    I'm trying to write a parser for a simple markup language with happy. Currently, I'm having some issues with infinit loops and nested elements. My markup language basicly consists of two elements, one for "normal" text and one for bold/emphasized text. data Markup = MarkupText String | MarkupEmph [Markup] For example, a text like Foo *bar* should get parsed as [MarkupText "Foo ", MarkupEmph [MarkupText "bar"]]. Lexing of that example works fine, but the parsing it results in an infinite loop - and I can't see why. This is my current approach: -- The main parser: Parsing a list of "Markup" Markups :: { [Markup] } : Markups Markup { $1 ++ [$2] } | Markup { [$1] } -- One single markup element Markup :: { Markup } : '*' Markups1 '*' { MarkupEmph $2 } | Markup1 { $1 } -- The nested list inside *..* Markups1 :: { [Markup] } : Markups1 Markup1 { $1 ++ [$2] } | Markup1 { [$1] } -- Markup which is always available: Markup1 :: { Markup } : String { MarkupText $1 } What's wrong with that approach? How could the be resolved? Update: Sorry. Lexing wasn't working as expected. The infinit loop was inside the lexer. Sorry. :) Update 2: On request, I'm using this as lexer: lexer :: String -> [Token] lexer [] = [] lexer str@(c:cs) | c == '*' = TokenSymbol "*" : lexer cs -- ...more rules... | otherwise = TokenString val : lexer rest where (val, rest) = span isValidChar str isValidChar = (/= '*') The infinit recursion occured because I had lexer str instead of lexer cs in that first rule for '*'. Didn't see it because my actual code was a bit more complex. :)

    Read the article

  • Can a parser tell the lexer to ignore a newline?

    - by chollida
    I'm writing a preprocessor for my language. In the preprocessor I've output a line that wasn't in the source file. This causes any error messages that Anltr creates to be incremented by one line. The Lexer handles the line count so I'm wondering if there is a way for the parser to tell the lexer to decrement the line count, or to ignore a specific newline. I'm also open to other suggestions on how to work around this. The only constraint I have is putting the extra line inline with the existing code. I'd prefer to keep it on it's own line to keep my parsing sane.

    Read the article

  • Why doesn't my QsciLexerCustom subclass work in PyQt4 using QsciScintilla?

    - by Jon Watte
    My end goal is to get Erlang syntax highlighting in QsciScintilla using PyQt4 and Python 2.6. I'm running on Windows 7, but will also need Ubuntu support. PyQt4 is missing the necessary wrapper code for the Erlang lexer/highlighter that "base" scintilla has, so I figured I'd write a lightweight one on top of QsciLexerCustom. It's a little bit problematic, because the Qsci wrapper seems to really want to talk about line+index rather than offset-from-start when getting/setting subranges of text. Meanwhile, the lexer gets arguments as offset-from-start. For now, I get a copy of the entire text, and split that up as appropriate. I have the following lexer, and I apply it with setLexer(). It gets all the appropriate calls when I open a new file and sets this as the lexer, and prints a bunch of appropriate lines based on what it's doing... but there is no styling in the document. I tried making all the defined styles red, and the document is still stubbornly black-on-white, so apparently the styles don't really "take effect" What am I doing wrong? If nobody here knows, what's the appropriate discussion forum where people might actually know these things? (It's an interesting intersection between Python, Qt and Scintilla, so I imagine the set of people who would know is small) Let's assume prefs.declare() just sets up a dict that returns the value for the given key (I've verified this -- it's not the problem). Let's assume scintilla is reasonably properly constructed into its host window QWidget. Specifically, if I apply a bundled lexer (such as QsciLexerPython), it takes effect and does show styled text. prefs.declare('font.name.margin', "MS Dlg") prefs.declare('font.size.margin', 8) prefs.declare('font.name.code', "Courier New") prefs.declare('font.size.code', 10) prefs.declare('color.editline', "#d0e0ff") class LexerErlang(Qsci.QsciLexerCustom): def __init__(self, obj = None): Qsci.QsciLexerCustom.__init__(self, obj) self.sci = None self.plainFont = QtGui.QFont() self.plainFont.setPointSize(int(prefs.get('font.size.code'))) self.plainFont.setFamily(prefs.get('font.name.code')) self.marginFont = QtGui.QFont() self.marginFont.setPointSize(int(prefs.get('font.size.code'))) self.marginFont.setFamily(prefs.get('font.name.margin')) self.boldFont = QtGui.QFont() self.boldFont.setPointSize(int(prefs.get('font.size.code'))) self.boldFont.setFamily(prefs.get('font.name.code')) self.boldFont.setBold(True) self.styles = [ Qsci.QsciStyle(0, QtCore.QString("base"), QtGui.QColor("#000000"), QtGui.QColor("#ffffff"), self.plainFont, True), Qsci.QsciStyle(1, QtCore.QString("comment"), QtGui.QColor("#008000"), QtGui.QColor("#eeffee"), self.marginFont, True), Qsci.QsciStyle(2, QtCore.QString("keyword"), QtGui.QColor("#000080"), QtGui.QColor("#ffffff"), self.boldFont, True), Qsci.QsciStyle(3, QtCore.QString("string"), QtGui.QColor("#800000"), QtGui.QColor("#ffffff"), self.marginFont, True), Qsci.QsciStyle(4, QtCore.QString("atom"), QtGui.QColor("#008080"), QtGui.QColor("#ffffff"), self.plainFont, True), Qsci.QsciStyle(5, QtCore.QString("macro"), QtGui.QColor("#808000"), QtGui.QColor("#ffffff"), self.boldFont, True), Qsci.QsciStyle(6, QtCore.QString("error"), QtGui.QColor("#000000"), QtGui.QColor("#ffd0d0"), self.plainFont, True), ] print("LexerErlang created") def description(self, ix): for i in self.styles: if i.style() == ix: return QtCore.QString(i.description()) return QtCore.QString("") def setEditor(self, sci): self.sci = sci Qsci.QsciLexerCustom.setEditor(self, sci) print("LexerErlang.setEditor()") def styleText(self, start, end): print("LexerErlang.styleText(%d,%d)" % (start, end)) lines = self.getText(start, end) offset = start self.startStyling(offset, 0) print("startStyling()") for i in lines: if i == "": self.setStyling(1, self.styles[0]) print("setStyling(1)") offset += 1 continue if i[0] == '%': self.setStyling(len(i)+1, self.styles[1]) print("setStyling(%)") offset += len(i)+1 continue self.setStyling(len(i)+1, self.styles[0]) print("setStyling(n)") offset += len(i)+1 def getText(self, start, end): data = self.sci.text() print("LexerErlang.getText(): " + str(len(data)) + " chars") return data[start:end].split('\n') Applied to the QsciScintilla widget as follows: _lexers = { 'erl': (Q.SCLEX_ERLANG, LexerErlang), 'hrl': (Q.SCLEX_ERLANG, LexerErlang), 'html': (Q.SCLEX_HTML, Qsci.QsciLexerHTML), 'css': (Q.SCLEX_CSS, Qsci.QsciLexerCSS), 'py': (Q.SCLEX_PYTHON, Qsci.QsciLexerPython), 'php': (Q.SCLEX_PHP, Qsci.QsciLexerHTML), 'inc': (Q.SCLEX_PHP, Qsci.QsciLexerHTML), 'js': (Q.SCLEX_CPP, Qsci.QsciLexerJavaScript), 'cpp': (Q.SCLEX_CPP, Qsci.QsciLexerCPP), 'h': (Q.SCLEX_CPP, Qsci.QsciLexerCPP), 'cxx': (Q.SCLEX_CPP, Qsci.QsciLexerCPP), 'hpp': (Q.SCLEX_CPP, Qsci.QsciLexerCPP), 'c': (Q.SCLEX_CPP, Qsci.QsciLexerCPP), 'hxx': (Q.SCLEX_CPP, Qsci.QsciLexerCPP), 'tpl': (Q.SCLEX_CPP, Qsci.QsciLexerCPP), 'xml': (Q.SCLEX_XML, Qsci.QsciLexerXML), } ... inside my document window class ... def addContentsDocument(self, contents, title): handler = self.makeScintilla() handler.title = title sci = handler.sci sci.append(contents) self.tabWidget.addTab(sci, title) self.tabWidget.setCurrentWidget(sci) self.applyLexer(sci, title) EventBus.bus.broadcast('command.done', {'text': 'Opened ' + title}) return handler def applyLexer(self, sci, title): (language, lexer) = language_and_lexer_from_title(title) if lexer: l = lexer() print("making lexer: " + str(l)) sci.setLexer(l) else: print("setting lexer by id: " + str(language)) sci.SendScintilla(Qsci.QsciScintillaBase.SCI_SETLEXER, language) linst = sci.lexer() print("lexer: " + str(linst)) def makeScintilla(self): sci = Qsci.QsciScintilla() sci.setUtf8(True) sci.setTabIndents(True) sci.setIndentationsUseTabs(False) sci.setIndentationWidth(4) sci.setMarginsFont(self.smallFont) sci.setMarginWidth(0, self.smallFontMetrics.width('00000')) sci.setFont(self.monoFont) sci.setAutoIndent(True) sci.setBraceMatching(Qsci.QsciScintilla.StrictBraceMatch) handler = SciHandler(sci) self.handlers[sci] = handler sci.setMarginLineNumbers(0, True) sci.setCaretLineVisible(True) sci.setCaretLineBackgroundColor(QtGui.QColor(prefs.get('color.editline'))) return handler Let's assume the rest of the application works, too (because it does :-)

    Read the article

  • How to write a simple Lexer/Parser with antlr 2.7?

    - by Burkhard
    Hello, I have a complex grammar (in antlr 2.7) which I need to extend. Having never used antlr before, I wanted to write a very simple Lexer and Parser first. I found a very good explanation for antlr3 and tried to adapt it: header{ #include <iostream> using namespace std; } options { language="Cpp"; } class P2 extends Parser; /* This will be the entry point of our parser. */ eval : additionExp ; /* Addition and subtraction have the lowest precedence. */ additionExp : multiplyExp ( "+" multiplyExp | "-" multiplyExp )* ; /* Multiplication and addition have a higher precedence. */ multiplyExp : atomExp ( "*" atomExp | "/" atomExp )* ; /* An expression atom is the smallest part of an expression: a number. Or when we encounter parenthesis, we're making a recursive call back to the rule 'additionExp'. As you can see, an 'atomExp' has the highest precedence. */ atomExp : Number | "(" additionExp ")" ; /* A number: can be an integer value, or a decimal value */ number : ("0".."9")+ ("." ("0".."9")+)? ; /* We're going to ignore all white space characters */ protected ws : (" " | "\t" | "\r" | "\n") { newline(); } ; It does generate four files without errors: P2.cpp, P2.hpp, P2TokenTypes.hpp and P2TokenTypes.txt. But now what? How do I create a working programm with that? I tried to add these files to a VS2005-WinConsole-Project but it does not compile: p2.cpp(277) : fatal error C1010: unexpected end of file while looking for precompiled header. Did you forget to add '#include "stdafx.h"' to your source?

    Read the article

  • Boost Spirit and Lex parser problem

    - by bpw1621
    I've been struggling to try and (incrementally) modify example code from the documentation but with not much different I am not getting the behavior I expect. Specifically, the "if" statement fails when (my intent is that) it should be passing (there was an "else" but that part of the parser was removed during debugging). The assignment statement works fine. I had a "while" statement as well which had the same problem as the "if" statement so I am sure if I can get help to figure out why one is not working it should be easy to get the other going. It must be kind of subtle because this is almost verbatim what is in one of the examples. #include <iostream> #include <fstream> #include <string> #define BOOST_SPIRIT_DEBUG #include <boost/config/warning_disable.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/lex_lexertl.hpp> #include <boost/spirit/include/phoenix_operator.hpp> #include <boost/spirit/include/phoenix_statement.hpp> #include <boost/spirit/include/phoenix_container.hpp> namespace qi = boost::spirit::qi; namespace lex = boost::spirit::lex; inline std::string read_from_file( const char* infile ) { std::ifstream instream( infile ); if( !instream.is_open() ) { std::cerr << "Could not open file: \"" << infile << "\"" << std::endl; exit( -1 ); } instream.unsetf( std::ios::skipws ); return( std::string( std::istreambuf_iterator< char >( instream.rdbuf() ), std::istreambuf_iterator< char >() ) ); } template< typename Lexer > struct LangLexer : lex::lexer< Lexer > { LangLexer() { identifier = "[a-zA-Z][a-zA-Z0-9_]*"; number = "[-+]?(\\d*\\.)?\\d+([eE][-+]?\\d+)?"; if_ = "if"; else_ = "else"; this->self = lex::token_def<> ( '(' ) | ')' | '{' | '}' | '=' | ';'; this->self += identifier | number | if_ | else_; this->self( "WS" ) = lex::token_def<>( "[ \\t\\n]+" ); } lex::token_def<> if_, else_; lex::token_def< std::string > identifier; lex::token_def< double > number; }; template< typename Iterator, typename Lexer > struct LangGrammar : qi::grammar< Iterator, qi::in_state_skipper< Lexer > > { template< typename TokenDef > LangGrammar( const TokenDef& tok ) : LangGrammar::base_type( program ) { using boost::phoenix::val; using boost::phoenix::ref; using boost::phoenix::size; program = +block; block = '{' >> *statement >> '}'; statement = assignment | if_stmt; assignment = ( tok.identifier >> '=' >> expression >> ';' ); if_stmt = ( tok.if_ >> '(' >> expression >> ')' >> block ); expression = ( tok.identifier[ qi::_val = qi::_1 ] | tok.number[ qi::_val = qi::_1 ] ); BOOST_SPIRIT_DEBUG_NODE( program ); BOOST_SPIRIT_DEBUG_NODE( block ); BOOST_SPIRIT_DEBUG_NODE( statement ); BOOST_SPIRIT_DEBUG_NODE( assignment ); BOOST_SPIRIT_DEBUG_NODE( if_stmt ); BOOST_SPIRIT_DEBUG_NODE( expression ); } qi::rule< Iterator, qi::in_state_skipper< Lexer > > program, block, statement; qi::rule< Iterator, qi::in_state_skipper< Lexer > > assignment, if_stmt; typedef boost::variant< double, std::string > expression_type; qi::rule< Iterator, expression_type(), qi::in_state_skipper< Lexer > > expression; }; int main( int argc, char** argv ) { typedef std::string::iterator base_iterator_type; typedef lex::lexertl::token< base_iterator_type, boost::mpl::vector< double, std::string > > token_type; typedef lex::lexertl::lexer< token_type > lexer_type; typedef LangLexer< lexer_type > LangLexer; typedef LangLexer::iterator_type iterator_type; typedef LangGrammar< iterator_type, LangLexer::lexer_def > LangGrammar; LangLexer lexer; LangGrammar grammar( lexer ); std::string str( read_from_file( 1 == argc ? "boostLexTest.dat" : argv[1] ) ); base_iterator_type strBegin = str.begin(); iterator_type tokenItor = lexer.begin( strBegin, str.end() ); iterator_type tokenItorEnd = lexer.end(); std::cout << std::setfill( '*' ) << std::setw(20) << '*' << std::endl << str << std::endl << std::setfill( '*' ) << std::setw(20) << '*' << std::endl; bool result = qi::phrase_parse( tokenItor, tokenItorEnd, grammar, qi::in_state( "WS" )[ lexer.self ] ); if( result ) { std::cout << "Parsing successful" << std::endl; } else { std::cout << "Parsing error" << std::endl; } return( 0 ); } Here is the output of running this (the file read into the string is dumped out first in main) ******************** { a = 5; if( a ){ b = 2; } } ******************** <program> <try>{</try> <block> <try>{</try> <statement> <try></try> <assignment> <try></try> <expression> <try></try> <success>;</success> <attributes>(5)</attributes> </expression> <success></success> <attributes>()</attributes> </assignment> <success></success> <attributes>()</attributes> </statement> <statement> <try></try> <assignment> <try></try> <fail/> </assignment> <if_stmt> <try> if(</try> <fail/> </if_stmt> <fail/> </statement> <fail/> </block> <fail/> </program> Parsing error

    Read the article

< Previous Page | 1 2 3 4 5 6  | Next Page >