How to write a bison grammer for WDI?
- by Rizo
I need some help in bison grammar construction.
From my another question:
I'm trying to make a meta-language for writing markup code (such as xml and html) wich can be directly embedded into C/C++ code.
Here is a simple sample written in this language, I call it WDI (Web Development Interface):
 /*
  * Simple wdi/html sample source code
  */
 #include <mySite>
 string name = "myName";
 string toCapital(string str);
 html
 {
  head {
   title { mySiteTitle; }
   link(rel="stylesheet", href="style.css");
  }
  body(id="default") {
   // Page content wrapper
   div(id="wrapper", class="some_class") {
    h1 { "Hello, " + toCapital(name) + "!"; }
    // Lists post
    ul(id="post_list") {
     for(post in posts) {
      li { a(href=post.getID()) { post.tilte; } }
     }
    }
   }
  }
 }
Basically it is a C source with a user-friendly interface for html.
As you can see the traditional tag-based style is substituted by C-like, with blocks delimited by curly braces.
I need to build an interpreter to translate this code to html and posteriorly insert it into C, so that it can be compiled. The C part stays intact.
Inside the wdi source it is not necessary to use prints, every return statement will be used for output (in printf function).
The program's output will be clean html code.
So, for example a heading 1 tag would be transformed like this:
h1 { "Hello, " + toCapital(name) + "!"; }
// would become:
printf("<h1>Hello, %s!</h1>", toCapital(name));
My main goal is to create an interpreter to translate wdi source to html like this:
tag(attributes) {content} = <tag attributes>content</tag>
Secondly, html code returned by the interpreter has to be inserted into C code with printfs. Variables and functions that occur inside wdi should also be sorted in order to use them as printf parameters (the case of toCapital(name) in sample source).
Here are my flex/bison files:
id        [a-zA-Z_]([a-zA-Z0-9_])*
number    [0-9]+
string    \".*\"
%%
{id} {
        yylval.string = strdup(yytext);
        return(ID);
    }
{number} {
        yylval.number = atoi(yytext);
        return(NUMBER);
    }
{string} {
        yylval.string = strdup(yytext);
        return(STRING);
    }
"(" { return(LPAREN); }
")" { return(RPAREN); }
"{" { return(LBRACE); }
"}" { return(RBRACE); }
"=" { return(ASSIGN); }
"," { return(COMMA);  }
";" { return(SEMICOLON); }
\n|\r|\f { /* ignore EOL */ }
[ \t]+   { /* ignore whitespace */ }
.        { /* return(CCODE); Find C source */ }
%%
%start wdi
%token LPAREN RPAREN LBRACE RBRACE ASSIGN COMMA SEMICOLON CCODE QUOTE
%union
{
    int number;
    char *string;
}
%token <string> ID STRING
%token <number> NUMBER
%%
wdi
    : /* empty */
    | blocks
    ;
blocks
    : block
    | blocks block
    ;
block
    : head SEMICOLON
    | head body
    ;
head
    : ID
    | ID
    attributes
    ;
attributes
    : LPAREN RPAREN
    | LPAREN attribute_list RPAREN
    ;
attribute_list
    : attribute
    | attribute COMMA attribute_list
    ;
attribute
    : key ASSIGN value
    ;
key
    : ID {$$=$1}
    ;
value
    : STRING {$$=$1}
    /*| NUMBER*/
    /*| CCODE*/
    ;
body
    : LBRACE content RBRACE
    ;
content
    : /* */
    | blocks
    | STRING SEMICOLON
    | NUMBER SEMICOLON
    | CCODE
    ;
%%
I am having difficulties on defining a proper grammar for the language, specially in splitting WDI and C code . I just started learning language processing techniques so I need some orientation.
Could someone correct my code or give some examples of what is the right way to solve this problem?