Search Results

Search found 6 results on 1 pages for 'greenoldman'.

Page 1/1 | 1

How lookaheads are propagated in "channel" method of building LALR parser?

- by greenoldman

The method is described in Dragon Book, however I read about it in ""Parsing Techniques" by D.Grune and C.J.H.Jacobs". I start from my understanding of building channels for NFA: channels are built once, they are like water channels with current you "drop" lookahead symbols in right places (sources) of the channel, and they propagate with "current" when symbol propagates, there are no barriers (the only sufficient things for propagation are presence of channel and direction/current); i.e. lookahead cannot just die out of the blue Is that right? If I am correct, then eof lookahead should be present in all states, because the source of it is the start production, and all other production states are reachable from start state. How the DFA is made out of this NFA is not perfectly clear for me -- the authors of the mentioned book write about preserving channels, but I see no purpose, if you propagated lookaheads. If the channels have to be preserved, are they cut off from the source if the DFA state does not include source NFA state? I assume no -- the channels still runs between DFA states, not only within given DFA state. In the effect eof should still be present in all items in all states. But when you take a look at DFA presented in book (pdf is from errata): DFA for LALR (fig. 9.34 in the book, p.301) you will see there are items without eof in lookahead. The grammar for this DFA is: S -> E E -> E - T E -> T T -> ( E ) T -> n So how it was computed, when eof was dropped, and on what condition? Update It is textual pdf, so two interesting states (in DFA; # is eof): State 1: S--- >•E[#] E--- >•E-T[#-] E--- >•T[#-] T--- >•n[#-] T--- >•(E)[#-] State 6: T--- >(•E)[#-)] E--- >•E-T[-)] E--- >•T[-)] T--- >•n[-)] T--- >•(E)[-)] Arc from 1 to 6 is labeled (.

Read the article
How to keep AST for feature access?

- by greenoldman

Consider such code (let's say it is C++) Foo::Bar.get().X How one should keep the AST for this -- as "tree" with root at left Foo(Bar(get(X)), or with root at right (((Foo)Bar)get)X? Or maybe as a flat structure (list)? The first one seems more convenient when resolving names, the second when working with it as expression. I set tag parsing but I am asking from semantic analysis POV really (there is no such tag).

Read the article
How are generics implemented?

- by greenoldman

This is the question from compiler internals perspective. I am interested in generics, not templates (C++), so I marked the question with C#. Not Java, because AFAIK the generics in both languages differ in implementations. When I look at languages w/o generics it is pretty straightforward, you can validate the class definition, add it to hierarchy and that's it. But what to do with generic class, and more importantly how handle references to it? How to make sure that static fields are singular per instantiations (i.e. each time generic parameters are resolved). Let's say I see a call: var x = new Foo<Bar>(); Do I add new Foo_Bar class to hierarchy? Update: So far I found only 2 relevant posts, however even they don't go into much details in sense "how to do it by yourself": http://www.jprl.com/Blog/archive/development/2007/Aug-31.html http://www.artima.com/intv/generics2.html

Read the article
How to add precedence to LALR parser like in YACC?

- by greenoldman

Please note, I am asking about writing LALR parser, not writing rules for LALR parser. What I need is... ...to mimic YACC precedence definitions. I don't know how it is implemented, and below I describe what I've done and read so far. For now I have basic LALR parser written. Next step -- adding precedence, so 2+3*4 could be parsed as 2+(3*4). I've read about precedence parsers, however I don't see how to fit such model into LALR. I don't understand two points: how to compute when insert parenthesis generator how to compute how many parenthesis the generator should create I insert generators when the symbols is taken from input and put at the stack, right? So let's say I have something like this (| denotes boundary between stack and input): ID = 5 | + ..., at this point I add open, so it gives ID = < 5 | + ..., then I read more input ID = < 5 + | 5 ... and more ID = < 5 + 5 | ; ... and more ID = < 5 + 5 ; | ... At this point I should have several reduce moves in normal LALR, but the open parenthesis does not match so I continue reading more input. Which does not make sense. So this was when problem. And about count, let's say I have such data < 2 + < 3 * 4 >. As human I can see that the last generator should create 2 parenthesis, but how to compute this? After all there could be two scenarios: ( 2 + ( 3 *4 )) -- parenthesis is used to show the outcome of generator or (2 + (( 3 * 4 ) ^ 5) because there was more input Please note that in both cases before 3 was open generator, and after 4 there was close generator. However in both cases, after reading 4 I have to reduce, so I have to know what generator "creates".

Read the article
How to get lookahead symbol when constructing LR(1) NFA for parser?

- by greenoldman

I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.292 in the 2nd edition) about how to construct an LR(1) parser, and I am at the stage of building the initial NFA. What I don't understand is how to get/compute a lookahead symbol. Here is the example from the book, the grammar: S -> E E -> E - T E -> T T -> ( E ) T -> n n is terminal. The "weird" transitions for me are is the sequence: 1) S -> . E eof 2) E -> . E - T eof 3) E -> . E - T - 4) E -> E . - T - 5) E -> E - . T - (Note: In the above table, the state numbers are in front and the lookahead symbol is at the end.) What puzzles me is that transition from (4) to (5) means reading - token, right? So how is it that - is still a lookahead symbol and even more important why is it that eof is no longer a lookahead symbol? After all in an input such as n - n eof there is only one - symbol. My naive thinking tells me (5) should be written as: 5) E -> E - . T - eof And another thing -- n is terminal. Why it is not used at all as a lookahead symbol? I mean -- we expect to see - or (, it is ok, but lack of n means we are sure it won't appear in input? Update: after more reading I am only more confused ;-) I.e. what is really a lookahead? Because I see such state as (p.292, 2nd column, 2nd row): E -> E . - T eof Lookahead says eof but the incoming input says -. Isn't it a contradiction? And it is not only in this book.

Read the article
How to read reduce/shift conflicts in LR(1) DFA?

- by greenoldman

I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.293 in the 2nd edition) and I moved forward from my last question: How to get lookahead symbol when constructing LR(1) NFA for parser? Now I have such "problem" (maybe not a problem, but rather need of confirmation from some more knowledgeable people). The authors present state in LR(0) which has reduce/shift conflict. Then they build DFA for LR(1) for the same grammar. And now they say it does not have a conflict (lookaheads at the end): S -> E . eof E -> E . - T eof E -> E . - T - and there is an edge from this state labeled - but no labeled eof. Authors says, that on eof there will be reduce, on - there will be shift. However eof is for shift as well (as lookahead). So my personal understanding of LR(1) DFA is this -- you can drop lookaheads for shifts, because they serve no purpose now -- shifts rely on input, not on lookaheads -- and after that, remove duplicates. S -> E . eof E -> E . - T So the lookahead for reduce serves as input really, because at this stage (all required input is read) it is really incoming symbol right now. For shifts, the input symbols are on the edges. So my question is this -- am I actually right about dropping lookaheads for shifts (after fully constructing DFA)?

Read the article

1