Search Results

Search found 67 results on 3 pages for 'tokenizer'.

Page 1/3 | 1 2 3  | Next Page >

  • Tokenizer for full-text

    - by user72185
    This should be an ideal case of not re-inventing the wheel, but so far my search has been in vain. Instead of writing one myself, I would like to use an existing C++ tokenizer. The tokens are to be used in an index for full text searching. Performance is very important, I will parse many gigabytes of text. Edit: Please note that the tokens are to be used in a search index. Creating such tokens is not an exact science (afaik) and requires some heuristics. This has been done a thousand time before, and probably in a thousand different ways, but I can't even find one of them :) Any good pointers? Thanks!

    Read the article

  • Is there a tokenizer for a cpp file

    - by AJ
    I have a cpp file with a huge class implementation. Now I have to modify the source file itself. For this, is there a library/api/tool that will tokenize this file for me and give me one token each time i request. My requirement is as below. OpenCPPFile() While (!EOF) token = GetNextToken(); process something based on this token EndWhile I am happy now Regards, AJ

    Read the article

  • How to implement tokenizer.rbegin() and rend() for boost::tokenizer ?

    - by Chan
    Hello everyone, I'm playing around with boost::tokenizer however I realize that it does not support rbegin() and rend(). I would like to ask how can I add these two functions to the existing class? This is from the boost site: #include <iostream> #include <string> #include <boost/tokenizer.hpp> using namespace std; using namespace boost; int main() { string str( "12/12/1986" ); typedef boost::tokenizer<boost::char_separator<char>> tokenizer; boost::char_separator<char> sep( "/" ); tokenizer tokens( str, sep ); cout << *tokens.begin() << endl; // cout << *tokens.rbegin() << endl; How could I implement this? return 0; }

    Read the article

  • Member initialization while using delegated constructor

    - by Anton
    I've started trying out the C++11 standard and i found this question which describes how to call your ctor from another ctor in the same class to avoid having a init method or the like. Now i'm trying the same thing with code that looks like this: hpp: class Tokenizer { public: Tokenizer(); Tokenizer(std::stringstream *lines); virtual ~Tokenizer() {}; private: std::stringstream *lines; }; cpp: Tokenizer::Tokenizer() : expected('=') { } Tokenizer::Tokenizer(std::stringstream *lines) : Tokenizer(), lines(lines) { } But this is giving me the error: In constructor ‘config::Tokenizer::Tokenizer(std::stringstream*)’: /path/Tokenizer.cpp:14:20: error: mem-initializer for ‘config::Tokenizer::lines’ follows constructor delegation I've tried moving the Tokenizer() part first and last in the list but that didn't help. What's the reason behind this and how should i fix it? I've tried moving the lines(lines) to the body with this->lines = lines; instead and it works fine. But i would really like to be able to use the initializer list. Thanks in advance!

    Read the article

  • String tokenizer for PHP

    - by Jack
    I have used the String Tokenizer in Java. I wish to know if there is similar functionality for PHP. I have a string and I want to extract individual words from it. eg. If the string is - Summer is doubtful #haiku #poetry #babel I want to know if it contains the hashtag #haiku.

    Read the article

  • Ant Tokenizer: Selecting an individual Token

    - by John Oxley
    I have the following ant task: <loadfile property="proj.version" srcfile="build.py"> <filterchain> <striplinecomments> <comment value="#"/> </striplinecomments> <linecontains> <contains value="Version" /> </linecontains> </filterchain> </loadfile> <echo message="${proj.version}" /> And the output is [echo] config ["Version"] = "v1.0.10-r4.2" How do I then use a tokenizer to get only v1.0.10-r4.2, the equivalent of | cut -d'"' -f4

    Read the article

  • Is it possible to create a single tokenizer to parse this?

    - by Adrian
    This extends off this other Q&A thread, but is going into details that are out of scope from the original question. I am generating a parser that is to parse a context-sensitive grammar which can take in the following subset of symbols: ,, [, ], {, }, m/[a-zA-Z_][a-zA-Z_0-9]*/, m/[0-9]+/ The grammar can take in the following string { abc[1] }, } and parse it as ({, abc[1], }, }). Another example would be to take: { abc[1] [, } and parse it as ({, abc[1], [,, }). This is similar to the grammar used in Perl for the qw() syntax. The braces indicate that the contents are to be whitespace tokenized. A closing brace must be on its own to indicate the end of the whitespace tokenized group. Can this be done using a single lexer/tokenizer, or would it be necessary to have a separate tokenizer when parsing this group?

    Read the article

  • Simple C# Tokenizer Using Regex

    - by Pete
    I'm looking to tokenize really simple strings,but struggling to get the right Regex. The strings might look like this: string1 = "{[Surname]}, some text... {[FirstName]}" string2 = "{Item}foo.{Item2}bar" And I want to extract the tokens in the curly braces (so string1 gets "{[Surname]}","{[FirstName]}" and string2 gets "{Item}" and "{Item2}") this question is quite good, but I can't get the regex right: poor mans lexer for c# Thanks for the help!

    Read the article

  • How to read values from file. tokenizer

    - by user69514
    I have a file in which each line contains two numbers. The problem is that the two number are separated by a space, but the space can be any number of blank spaces. either one, two, or more. I want to read the line and store each of the numbers in a variable, but I'm not sure how to tokenize it. i.e 1 5 3 2 5 6 3 4 83 54 23 23 32 88 8 203

    Read the article

  • How to make the tokenizer detect empty spaces while using strtok()

    - by Shadi Al Mahallawy
    I am designing a c++ program, somewhere in the program i need to detect if there is a blank(empty token) next to the token used know eg. if(token1==start) { token2=strtok(NULL," "); if(token2==NULL) {LCCTR=0;} else {LCCTR=atoi(token2);} so in the previous peice token1 is pointing to start , and i want to check if there is anumber next to the start , so I used token2=strtok(NULL," ") to point to the next token but unfortunattly the strtok function cannot detect empty spaces so it gives me an error at run time"INVALID NULL POINTER" how can i fix it or is there another function to use to detect empty spaces #include <iostream> #include<string> #include<map> #include<iomanip> #include<fstream> #include<ctype.h> using namespace std; const int MAX=300; int LCCTR; int START(char* token1); char* PASS1(char*token1); void tokinizer() { ifstream in; ofstream out; char oneline[MAX]; in.open("infile.txt"); out.open("outfile.txt"); if(in.is_open()) { char *token1; in.getline(oneline,MAX); token1 = strtok(oneline," \t"); START (token1); //cout<<'\t'; while(token1!=NULL) { //PASS1(token1); //cout<<token1<<" "; token1=strtok(NULL," \t"); if(NULL==token1) {//cout<<endl; //cout<<LCCTR<<'\t'; in.getline(oneline,MAX); token1 = strtok(oneline," \t"); } } } in.close(); out.close(); } int START(char* token1) { string start("START"); char*token2; if(token1 != start) {LCCTR=0;} else if(token1==start) { token2=strchr(token1+2,' '); cout<<token2; if(token2==NULL) {LCCTR=0;} else {LCCTR=atoi(token2); if(atoi(token2)>9999||atoi(token2)<0){cout<<"IVALID STARTING ADDRESS"<<endl;exit(1);} } } return LCCTR; } char* PASS1 (char*token1) { map<string,int> operations; map<string,int>symtable; map<string,int>::iterator it; pair<map<string,int>::iterator,bool> ret; char*token3=NULL; char*token2=NULL; string test; string comp(" "); string start("START"); string word("WORD"); string byte("BYTE"); string resb("RESB"); string resw("RESW"); string end("END"); operations["ADD"] = 18; operations["AND"] = 40; operations["COMP"] = 28; operations["DIV"] = 24; operations["J"] = 0X3c; operations["JEQ"] =30; operations["JGT"] =34; operations["JLT"] =38; operations["JSUB"] =48; operations["LDA"] =00; operations["LDCH"] =50; operations["LDL"] =55; operations["LDX"] =04; operations["MUL"] =20; operations["OR"] =44; operations["RD"] =0xd8; operations["RSUB"] =0x4c; operations["STA"] =0x0c; operations["STCH"] =54; operations["STL"] =14; operations["STSW"] =0xe8; operations["STX"] =10; operations["SUB"] =0x1c; operations["TD"] =0xe0; operations["TIX"] =0x2c; operations["WD"] =0xdc; if(operations.find("ADD")->first==token1) { token2=strtok(NULL," "); //test=token2; cout<<token2; //if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} //else{LCCTR=LCCTR+3;} } /*else if(operations.find("AND")->first==token1) { token2=strtok(NULL," "); test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("COMP")->first==token1) { token2=token1+5; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("DIV")->first==token1) { token2=token1+4; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("J")->first==token1) { token2=token1+2; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("JEQ")->first==token1) { token2=token1+5; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("JGT")->first==token1) { token2=strtok(NULL," "); test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("JLT")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("JSUB")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("LDA")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("LDCH")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("LDL")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("LDX")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("MUL")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("OR")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("RD")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("RSUB")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STA")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STCH")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STL")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STSW")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("STX")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("SUB")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("TD")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("TIX")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} } else if(operations.find("WD")->first==token1) { token2=token1+6; test=token2; if(test.empty()){cout<<"MISSING OPERAND"<<endl;exit(1);} else{LCCTR=LCCTR+3;} }*/ //else if( if(word==token1) {LCCTR=LCCTR+3;} else if(byte==token1) {string test; token2=token1+7; test=token2; if(test[0]=='C') {token3=token1+10; test=token3; if(test.length()>15) {cout<<"ERROR"<<endl; exit(1);} } else if(test[0]=='X') {token3=token1+10; test=token3; if(test.length()>14) {cout<<"ERROR"<<endl; exit(1);} } LCCTR=LCCTR+test.length(); } else if(resb==token1) {token3=token1+5; LCCTR=LCCTR+atoi(token3);} else if(resw==token1) {token3=token1+5; LCCTR=LCCTR+3*atoi(token3);} else if(end==token1) {exit(1);} /*else { test=token1; int last=test.length(); if(token1==start||test[0]=='C'||test[0]=='X'||ispunct(test[last])||isdigit(test[0])||isdigit(test[1])||isdigit(test[2])||isdigit(test[3])){} else { token2=strtok(NULL," "); //test=token2; cout<<token2; if(token2!=NULL) { symtable.insert( pair<string,int>(token1,LCCTR)); for(it=symtable.begin() ;it!=symtable.end() ;++it) {/*cout<<"symbol: "<<it->first<<" LCCTR: "<<it->second<<endl;} } else{} } }*/ return token3; } int main() { tokinizer(); return 0; }

    Read the article

  • PHP String tokenizer not working correctly

    - by asdadas
    I have no clue why strtok decided to break on me. Here is my code. I am tokenizing a string by dollar symbol $. echo 'Tokenizing this by $: ',$aliases,PHP_EOL; if(strlen($aliases) > 0) { //aliases check $token = strtok($aliases, '$'); while($token != NULL) { echo 'Found a token: ',$token,PHP_EOL; if(!isGoodLookup($token)) { echo 'ERROR: Invalid alias found.',PHP_EOL; stop($db); } $goodAliasesList[] = $token; $token = strtok('$'); } if($token == NULL) echo 'Found null token, moving on',PHP_EOL; } And this is my output: Tokenizing this by $: getaways$aaa Found a token: getaways Found null token, moving on str tok is not supposed to do this!! where is my aaa token!!

    Read the article

  • How to use NGramTokenizerFactory or NGramFilterFactory?

    - by user572485
    Hi, Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned. I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word. I use the parameters maxGramSize and minGramSize, set to 1 and 3. Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me? Thanks!

    Read the article

  • Solr WordDelimiterFilter + Lucene Highlighter

    - by Lucas
    I am trying to get the Highlighter class from Lucene to work properly with tokens coming from Solr's WordDelimiterFilter. It works 90% of the time, but if the matching text contains a ',' such as "1,500" the output is incorrect: Expected: 'test 1,500 this' Observed: 'test 11,500 this' I am not currently sure whether it is Highlighter messing up the recombination or WordDelimiterFilter messing up the tokenization but something is unhappy. Here are the relevant dependencies from my pom: org.apache.lucene lucene-core 2.9.3 jar compile org.apache.lucene lucene-highlighter 2.9.3 jar compile org.apache.solr solr-core 1.4.0 jar compile And here is a simple JUnit test class demonstrating the problem: package test.lucene; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; import java.io.IOException; import java.io.Reader; import java.util.HashMap; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Query; import org.apache.lucene.search.highlight.Highlighter; import org.apache.lucene.search.highlight.InvalidTokenOffsetsException; import org.apache.lucene.search.highlight.QueryScorer; import org.apache.lucene.search.highlight.SimpleFragmenter; import org.apache.lucene.search.highlight.SimpleHTMLFormatter; import org.apache.lucene.util.Version; import org.apache.solr.analysis.StandardTokenizerFactory; import org.apache.solr.analysis.WordDelimiterFilterFactory; import org.junit.Test; public class HighlighterTester { private static final String PRE_TAG = "<b>"; private static final String POST_TAG = "</b>"; private static String[] highlightField( Query query, String fieldName, String text ) throws IOException, InvalidTokenOffsetsException { SimpleHTMLFormatter formatter = new SimpleHTMLFormatter( PRE_TAG, POST_TAG ); Highlighter highlighter = new Highlighter( formatter, new QueryScorer( query, fieldName ) ); highlighter.setTextFragmenter( new SimpleFragmenter( Integer.MAX_VALUE ) ); return highlighter.getBestFragments( getAnalyzer(), fieldName, text, 10 ); } private static Analyzer getAnalyzer() { return new Analyzer() { @Override public TokenStream tokenStream( String fieldName, Reader reader ) { // Start with a StandardTokenizer TokenStream stream = new StandardTokenizerFactory().create( reader ); // Chain on a WordDelimiterFilter WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory(); HashMap<String, String> arguments = new HashMap<String, String>(); arguments.put( "generateWordParts", "1" ); arguments.put( "generateNumberParts", "1" ); arguments.put( "catenateWords", "1" ); arguments.put( "catenateNumbers", "1" ); arguments.put( "catenateAll", "0" ); wordDelimiterFilterFactory.init( arguments ); return wordDelimiterFilterFactory.create( stream ); } }; } @Test public void TestHighlighter() throws ParseException, IOException, InvalidTokenOffsetsException { String fieldName = "text"; String text = "test 1,500 this"; String queryString = "1500"; String expected = "test " + PRE_TAG + "1,500" + POST_TAG + " this"; QueryParser parser = new QueryParser( Version.LUCENE_29, fieldName, getAnalyzer() ); Query q = parser.parse( queryString ); String[] observed = highlightField( q, fieldName, text ); for ( int i = 0; i < observed.length; i++ ) { System.out.println( "\t" + i + ": '" + observed[i] + "'" ); } if ( observed.length > 0 ) { System.out.println( "Expected: '" + expected + "'\n" + "Observed: '" + observed[0] + "'" ); assertEquals( expected, observed[0] ); } else { assertTrue( "No matches found", false ); } } } Anyone have any ideas or suggestions?

    Read the article

  • How do I read input character-by-character in Java?

    - by Jergason
    I am used to the c-style getchar(), but it seems like there is nothing comparable for java. I am building a lexical analyzer, and I need to read in the input character by character. I know I can use the scanner to scan in a token or line and parse through the token char-by-char, but that seems unwieldy for strings spanning multiple lines. Is there a way to just get the next character from the input buffer in Java, or should I just plug away with the Scanner class? Edit: forgot to say where the input is coming from. The input is a file, not the keyboard.

    Read the article

  • StringTokenizer problem of tokenizing

    - by Mr CooL
    String a ="the STRING TOKENIZER CLASS ALLOWS an APPLICATION to BREAK a STRING into TOKENS.  "; StringTokenizer st = new StringTokenizer(a); while (st.hasMoreTokens()){ System.out.println(st.nextToken()); Given above codes, the output is following, the STRING TOKENIZER CLASS ALLOWS an APPLICATION to BREAK a STRING into TOKENS.  My only question is why the "STRING TOKENIZER CLASS" has been combined into one token???????? When I try to run this code, System.out.println("STRING TOKENIZER CLASS".contains(" ")); It printed funny result, FALSE It sound not logical right? I've no idea what went wrong. I found out the reason, the space was not recognized as valid space by Java somehow. But, I don't know how it turned up to be like that from the front processing up to the code that I've posted.

    Read the article

  • Tokenize problem in Java with separator ". "

    - by user112976
    I need to split a text using the separator ". ". For example I want this string : Washington is the U.S Capital. Barack is living there. To be cut into two parts: Washington is the U.S Capital. Barack is living there. Here is my code : // Initialize the tokenizer StringTokenizer tokenizer = new StringTokenizer("Washington is the U.S Capital. Barack is living there.", ". "); while (tokenizer.hasMoreTokens()) { System.out.println(tokenizer.nextToken()); } And the output is unfortunately : Washington is the U S Capital Barack is living there Can someone explain what's going on?

    Read the article

  • Re: Help with Boost Grammar

    - by Decmac04
    I have redesigned and extended the grammar I asked about earlier as shown below: // BIFAnalyser.cpp : Defines the entry point for the console application. // // /*============================================================================= Copyright (c) Temitope Jos Onunkun 2010 http://www.dcs.kcl.ac.uk/pg/onun/ Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) =============================================================================*/ //////////////////////////////////////////////////////////////////////////// // // // B Machine parser using the Boost "Grammar" and "Semantic Actions". // // // //////////////////////////////////////////////////////////////////////////// include include include include include include //////////////////////////////////////////////////////////////////////////// using namespace std; using namespace boost::spirit; //////////////////////////////////////////////////////////////////////////// // // Semantic Actions // //////////////////////////////////////////////////////////////////////////// // // namespace { //semantic action function on individual lexeme void do_noint(char const* start, char const* end) { string str(start, end); if (str != "NAT1") cout << "PUSH(" << str << ')' << endl; } //semantic action function on addition of lexemes void do_add(char const*, char const*) { cout << "ADD" << endl; // for(vector::iterator vi = strVect.begin(); vi < strVect.end(); ++vi) // cout << *vi << " "; } //semantic action function on subtraction of lexemes void do_subt(char const*, char const*) { cout << "SUBTRACT" << endl; } //semantic action function on multiplication of lexemes void do_mult(char const*, char const*) { cout << "\nMULTIPLY" << endl; } //semantic action function on division of lexemes void do_div(char const*, char const*) { cout << "\nDIVIDE" << endl; } // // vector flowTable; //semantic action function on simple substitution void do_sSubst(char const* start, char const* end) { string str(start, end); //use boost tokenizer to break down tokens typedef boost::tokenizer Tokenizer; boost::char_separator sep(" -+/*:=()",0,boost::drop_empty_tokens); // char separator definition Tokenizer tok(str, sep); Tokenizer::iterator tok_iter = tok.begin(); pair dependency; //create a pair object for dependencies //create a vector object to store all tokens vector dx; // int counter = 0; // tracks token position for(tok.begin(); tok_iter != tok.end(); ++tok_iter) //save all tokens in vector { dx.push_back(*tok_iter ); } counter = dx.size(); // vector d_hat; //stores set of dependency pairs string dep; //pairs variables as string object // dependency.first = *tok.begin(); vector FV; for(int unsigned i=1; i < dx.size(); i++) { // if(!atoi(dx.at(i).c_str()) && (dx.at(i) !=" ")) { dependency.second = dx.at(i); dep = dependency.first + "|-" + dependency.second + " "; d_hat.push_back(dep); vector<string> row; row.push_back(dependency.first); //push x_hat into first column of each row for(unsigned int j=0; j<2; j++) { row.push_back(dependency.second);//push an element (column) into the row } flowTable.push_back(row); //Add the row to the main vector } } //displays internal representation of information flow table cout << "\n****************\nDependency Table\n****************\n"; cout << "X_Hat\tDx\tG_Hat\n"; cout << "-----------------------------\n"; for(unsigned int i=0; i < flowTable.size(); i++) { for(unsigned int j=0; j<2; j++) { cout << flowTable[i][j] << "\t "; } if (*tok.begin() != "WHILE" ) //if there are no global flows, cout << "\t{}"; //display empty set cout << "\n"; } cout << "***************\n\n"; for(int unsigned j=0; j < FV.size(); j++) { if(FV.at(j) != dependency.second) dep = dependency.first + "|-" + dependency.second + " "; d_hat.push_back(dep); } cout << "PUSH(" << str << ')' << endl; cout << "\n*******\nDependency pairs\n*******\n"; for(int unsigned i=0; i < d_hat.size(); i++) cout << d_hat.at(i) << "\n...\n"; cout << "\nSIMPLE SUBSTITUTION\n\n"; } //semantic action function on multiple substitution void do_mSubst(char const* start, char const* end) { string str(start, end); cout << "PUSH(" << str << ')' << endl; //cout << "\nMULTIPLE SUBSTITUTION\n\n"; } //semantic action function on unbounded choice substitution void do_mChoice(char const* start, char const* end) { string str(start, end); cout << "PUSH(" << str << ')' << endl; cout << "\nUNBOUNDED CHOICE SUBSTITUTION\n\n"; } void do_logicExpr(char const* start, char const* end) { string str(start, end); //use boost tokenizer to break down tokens typedef boost::tokenizer Tokenizer; boost::char_separator sep(" -+/*=:()<",0,boost::drop_empty_tokens); // char separator definition Tokenizer tok(str, sep); Tokenizer::iterator tok_iter = tok.begin(); //pair dependency; //create a pair object for dependencies //create a vector object to store all tokens vector dx; for(tok.begin(); tok_iter != tok.end(); ++tok_iter) //save all tokens in vector { dx.push_back(*tok_iter ); } for(unsigned int i=0; i cout << "PUSH(" << str << ')' << endl; cout << "\nPREDICATE\n\n"; } void do_predicate(char const* start, char const* end) { string str(start, end); cout << "PUSH(" << str << ')' << endl; cout << "\nMULTIPLE PREDICATE\n\n"; } void do_ifSelectPre(char const* start, char const* end) { string str(start, end); //if cout << "PUSH(" << str << ')' << endl; cout << "\nPROTECTED SUBSTITUTION\n\n"; } //semantic action function on machine substitution void do_machSubst(char const* start, char const* end) { string str(start, end); cout << "PUSH(" << str << ')' << endl; cout << "\nMACHINE SUBSTITUTION\n\n"; } } //////////////////////////////////////////////////////////////////////////// // // Machine Substitution Grammar // //////////////////////////////////////////////////////////////////////////// // Simple substitution grammar parser with integer values removed struct Substitution : public grammar { template struct definition { definition(Substitution const& ) { machine_subst = ( (simple_subst) | (multi_subst) | (if_select_pre_subst) | (unbounded_choice) )[&do_machSubst] ; unbounded_choice = str_p("ANY") ide_list str_p("WHERE") predicate str_p("THEN") machine_subst str_p("END") ; if_select_pre_subst = ( ( str_p("IF") predicate str_p("THEN") machine_subst *( str_p("ELSIF") predicate machine_subst ) !( str_p("ELSE") machine_subst) str_p("END") ) | ( str_p("SELECT") predicate str_p("THEN") machine_subst *( str_p("WHEN") predicate machine_subst ) !( str_p("ELSE") machine_subst) str_p("END")) | ( str_p("PRE") predicate str_p("THEN") machine_subst str_p("END") ) )[&do_ifSelectPre] ; multi_subst = ( (machine_subst) *( ( str_p("||") (machine_subst) ) | ( str_p("[]") (machine_subst) ) ) ) [&do_mSubst] ; simple_subst = (identifier str_p(":=") arith_expr) [&do_sSubst] ; expression = predicate | arith_expr ; predicate = ( (logic_expr) *( ( ch_p('&') (logic_expr) ) | ( str_p("OR") (logic_expr) ) ) )[&do_predicate] ; logic_expr = ( identifier (str_p("<") arith_expr) | (str_p("<") arith_expr) | (str_p("/:") arith_expr) | (str_p("<:") arith_expr) | (str_p("/<:") arith_expr) | (str_p("<<:") arith_expr) | (str_p("/<<:") arith_expr) | (str_p("<=") arith_expr) | (str_p("=") arith_expr) | (str_p("=") arith_expr) | (str_p("=") arith_expr) ) [&do_logicExpr] ; arith_expr = term *( ('+' term)[&do_add] | ('-' term)[&do_subt] ) ; term = factor ( ('' factor)[&do_mult] | ('/' factor)[&do_div] ) ; factor = lexeme_d[( identifier | +digit_p)[&do_noint]] | '(' expression ')' | ('+' factor) ; ide_list = identifier *( ch_p(',') identifier ) ; identifier = alpha_p +( alnum_p | ch_p('_') ) ; } rule machine_subst, unbounded_choice, if_select_pre_subst, multi_subst, simple_subst, expression, predicate, logic_expr, arith_expr, term, factor, ide_list, identifier; rule<ScannerT> const& start() const { return predicate; //return multi_subst; //return machine_subst; } }; }; //////////////////////////////////////////////////////////////////////////// // // Main program // //////////////////////////////////////////////////////////////////////////// int main() { cout << "*********************************\n\n"; cout << "\t\t...Machine Parser...\n\n"; cout << "*********************************\n\n"; // cout << "Type an expression...or [q or Q] to quit\n\n"; string str; int machineCount = 0; char strFilename[256]; //file name store as a string object do { cout << "Please enter a filename...or [q or Q] to quit:\n\n "; //prompt for file name to be input //char strFilename[256]; //file name store as a string object cin strFilename; if(*strFilename == 'q' || *strFilename == 'Q') //termination condition return 0; ifstream inFile(strFilename); // opens file object for reading //output file for truncated machine (operations only) if (inFile.fail()) cerr << "\nUnable to open file for reading.\n" << endl; inFile.unsetf(std::ios::skipws); Substitution elementary_subst; // Simple substitution parser object string next; while (inFile str) { getline(inFile, next); str += next; if (str.empty() || str[0] == 'q' || str[0] == 'Q') break; parse_info< info = parse(str.c_str(), elementary_subst !end_p, space_p); if (info.full) { cout << "\n-------------------------\n"; cout << "Parsing succeeded\n"; cout << "\n-------------------------\n"; } else { cout << "\n-------------------------\n"; cout << "Parsing failed\n"; cout << "stopped at: " << info.stop << "\"\n"; cout << "\n-------------------------\n"; } } } while ( (*strFilename != 'q' || *strFilename !='Q')); return 0; } However, I am experiencing the following unexpected behaviours on testing: The text files I used are: f1.txt, ... containing ...: debt:=(LoanRequest+outstandingLoan1)*20 . f2.txt, ... containing ...: debt:=(LoanRequest+outstandingLoan1)*20 || newDebt := loanammount-paidammount || price := purchasePrice + overhead + bb . f3.txt, ... containing ...: yy < (xx+7+ww) . f4.txt, ... containing ...: yy < (xx+7+ww) & yy : NAT . When I use multi_subst as start rule both files (f1 and f2) are parsed correctly; When I use machine_subst as start rule file f1 parse correctly, while file f2 fails, producing the error: “Parsing failed stopped at: || newDebt := loanammount-paidammount || price := purchasePrice + overhead + bb” When I use predicate as start symbol, file f3 parse correctly, but file f4 yields the error: “ “Parsing failed stopped at: & yy : NAT” Can anyone help with the grammar, please? It appears there are problems with the grammar that I have so far been unable to spot.

    Read the article

  • CSV Parser works in windows, not linux.

    - by ladookie
    I'm parsing a CSV file that looks like this: E1,E2,E7,E8,,, E2,E1,E3,,,, E3,E2,E8,,, E4,E5,E8,E11,,, I store the first entry in each line in a string, and the rest go in a vector of strings: while (getline(file_input, line)) { stringstream tokenizer; tokenizer << line; getline(tokenizer, roomID, ','); vector<string> aVector; while (getline(tokenizer, adjRoomID, ',')) { if (!adjRoomID.empty()) { aVector.push_back(adjRoomID); } } Room aRoom(roomID, aVector); rooms.addToTail(aRoom); } In windows this works fine, however in Linux the first entry of each vector mysteriously loses the first character. For Example in the first iteration through the while loop: roomID would be E1 and aVector would be 2 E7 E8 then the second iteration: roomID would be E2 and aVector would be 1 E3 Notice the missing E's in the first entry of aVector. when I put in some debugging code it appears that it is initially being stored correctly in the vector, but then something overwrites it. Kudos to whoever figures this one out. Seems bizarre to me. rooms is declared as such: DLList<Room> rooms where DLList stands for Doubly-Linked list.

    Read the article

  • Help with Boost Grammar

    - by Decmanc04
    I have been using the following win32 console code to try to parse a B Machine Grammar embedded within C++ using Boost Spirit grammar template. I am a relatively new Boost user. The code compiles, but when I run the .exe file produced by VC++2008, the program partially parses the input file. I believe the problem is with my grammar definition or the functions attached as semantic atctions. The code is given below: // BIFAnalyser.cpp : Defines the entry point for the console application. // // /*============================================================================= Copyright (c) Temitope Jos Onunkun 2010 http://www.dcs.kcl.ac.uk/pg/onun/ Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) =============================================================================*/ //////////////////////////////////////////////////////////////////////////// // // // B Machine parser using the Boost "Grammar" and "Semantic Actions". // // // //////////////////////////////////////////////////////////////////////////// #include <boost/spirit/core.hpp> #include <boost/tokenizer.hpp> #include <iostream> #include <string> #include <fstream> #include <vector> #include <utility> /////////////////////////////////////////////////////////////////////////////////////////// using namespace std; using namespace boost::spirit; /////////////////////////////////////////////////////////////////////////////////////////// // // Semantic actions // //////////////////////////////////////////////////////////////////////////// vector<string> strVect; namespace { //semantic action function on individual lexeme void do_noint(char const* str, char const* end) { string s(str, end); if(atoi(str)) { ; } else { strVect.push_back(s); cout << "PUSH(" << s << ')' << endl; } } //semantic action function on addition of lexemes void do_add(char const*, char const*) { cout << "ADD" << endl; for(vector<string>::iterator vi = strVect.begin(); vi < strVect.end(); ++vi) cout << *vi << " "; } //semantic action function on subtraction of lexemes void do_subt(char const*, char const*) { cout << "SUBTRACT" << endl; for(vector<string>::iterator vi = strVect.begin(); vi < strVect.end(); ++vi) cout << *vi << " "; } //semantic action function on multiplication of lexemes void do_mult(char const*, char const*) { cout << "\nMULTIPLY" << endl; for(vector<string>::iterator vi = strVect.begin(); vi < strVect.end(); ++vi) cout << *vi << " "; cout << "\n"; } //semantic action function on division of lexemes void do_div(char const*, char const*) { cout << "\nDIVIDE" << endl; for(vector<string>::iterator vi = strVect.begin(); vi < strVect.end(); ++vi) cout << *vi << " "; } //semantic action function on simple substitution void do_sSubst(char const* str, char const* end) { string s(str, end); //use boost tokenizer to break down tokens typedef boost::tokenizer<boost::char_separator<char> > Tokenizer; boost::char_separator<char> sep("-+/*:=()"); // default char separator Tokenizer tok(s, sep); Tokenizer::iterator tok_iter = tok.begin(); pair<string, string > dependency; //create a pair object for dependencies //save first variable token in simple substitution dependency.first = *tok.begin(); //create a vector object to store all tokens vector<string> dx; // for( ; tok_iter != tok.end(); ++tok_iter) //save all tokens in vector { dx.push_back(*tok_iter ); } vector<string> d_hat; //stores set of dependency pairs string dep; //pairs variables as string object for(int unsigned i=1; i < dx.size()-1; i++) { dependency.second = dx.at(i); dep = dependency.first + "|->" + dependency.second + " "; d_hat.push_back(dep); } cout << "PUSH(" << s << ')' << endl; for(int unsigned i=0; i < d_hat.size(); i++) cout <<"\n...\n" << d_hat.at(i) << " "; cout << "\nSIMPLE SUBSTITUTION\n"; } //semantic action function on multiple substitution void do_mSubst(char const* str, char const* end) { string s(str, end); //use boost tokenizer to break down tokens typedef boost::tokenizer<boost::char_separator<char> > Tok; boost::char_separator<char> sep("-+/*:=()"); // default char separator Tok tok(s, sep); Tok::iterator tok_iter = tok.begin(); // string start = *tok.begin(); vector<string> mx; for( ; tok_iter != tok.end(); ++tok_iter) //save all tokens in vector { mx.push_back(*tok_iter ); } mx.push_back("END\n"); //add a marker "end" for(unsigned int i=0; i<mx.size(); i++) { // if(mx.at(i) == "END" || mx.at(i) == "||" ) // break; // else if( mx.at(i) == "||") // do_sSubst(str, end); // else // { // do_sSubst(str, end); // } cout << "\nTokens ... " << mx.at(i) << " "; } cout << "PUSH(" << s << ')' << endl; cout << "MULTIPLE SUBSTITUTION\n"; } } //////////////////////////////////////////////////////////////////////////// // // Simple Substitution Grammar // //////////////////////////////////////////////////////////////////////////// // Simple substitution grammar parser with integer values removed struct Substitution : public grammar<Substitution> { template <typename ScannerT> struct definition { definition(Substitution const& ) { multi_subst = (simple_subst [&do_mSubst] >> +( str_p("||") >> simple_subst [&do_mSubst]) ) ; simple_subst = (Identifier >> str_p(":=") >> expression)[&do_sSubst] ; Identifier = alpha_p >> +alnum_p//[do_noint] ; expression = term >> *( ('+' >> term)[&do_add] | ('-' >> term)[&do_subt] ) ; term = factor >> *( ('*' >> factor)[&do_mult] | ('/' >> factor)[&do_div] ) ; factor = lexeme_d[( (alpha_p >> +alnum_p) | +digit_p)[&do_noint]] | '(' >> expression >> ')' | ('+' >> factor) ; } rule<ScannerT> expression, term, factor, Identifier, simple_subst, multi_subst ; rule<ScannerT> const& start() const { return multi_subst; } }; }; //////////////////////////////////////////////////////////////////////////// // // Main program // //////////////////////////////////////////////////////////////////////////// int main() { cout << "************************************************************\n\n"; cout << "\t\t...Machine Parser...\n\n"; cout << "************************************************************\n\n"; // cout << "Type an expression...or [q or Q] to quit\n\n"; //prompt for file name to be input cout << "Please enter a filename...or [q or Q] to quit:\n\n "; char strFilename[256]; //file name store as a string object cin >> strFilename; ifstream inFile(strFilename); // opens file object for reading //output file for truncated machine (operations only) Substitution elementary_subst; // Simple substitution parser object string str, next; // inFile.open(strFilename); while (inFile >> str) { getline(cin, next); str += next; if (str.empty() || str[0] == 'q' || str[0] == 'Q') break; parse_info<> info = parse(str.c_str(), elementary_subst, space_p); if (info.full) { cout << "\n-------------------------\n"; cout << "Parsing succeeded\n"; cout << "\n-------------------------\n"; } else { cout << "\n-------------------------\n"; cout << "Parsing failed\n"; cout << "stopped at: \": " << info.stop << "\"\n"; cout << "\n-------------------------\n"; } } cout << "Please enter a filename...or [q or Q] to quit\n"; cin >> strFilename; return 0; } The contents of the file I tried to parse, which I named "mf7.txt" is given below: debt:=(LoanRequest+outstandingLoan1)*20 || newDebt := loanammount-paidammount The output when I execute the program is: ************************************************************ ...Machine Parser... ************************************************************ Please enter a filename...or [q or Q] to quit: c:\tplat\mf7.txt PUSH(LoanRequest) PUSH(outstandingLoan1) ADD LoanRequest outstandingLoan1 MULTIPLY LoanRequest outstandingLoan1 PUSH(debt:=(LoanRequest+outstandingLoan1)*20) ... debt|->LoanRequest ... debt|->outstandingLoan1 SIMPLE SUBSTITUTION Tokens ... debt Tokens ... LoanRequest Tokens ... outstandingLoan1 Tokens ... 20 Tokens ... END PUSH(debt:=(LoanRequest+outstandingLoan1)*20) MULTIPLE SUBSTITUTION ------------------------- Parsing failedstopped at: ": " ------------------------- My intention is to capture only the variables in the file, which I managed to do up to the "||" string. Clearly, the program is not parsing beyond the "||" string in the input file. I will appreciate assistance to fix the grammar. SOS, please.

    Read the article

  • C# Sockets Buffer Overflow No Error

    - by Michael Covelli
    I have one thread that is receiving data over a socket like this: while (sock.Connected) { // Receive Data (Block if no data) recvn = sock.Receive(recvb, 0, rlen, SocketFlags.None, out serr); if (recvn <= 0 || sock == null || !sock.Connected) { OnError("Error In Receive, recvn <= 0 || sock == null || !sock.Connected"); return; } else if (serr != SocketError.Success) { OnError("Error In Receive, serr = " + serr); return; } // Copy Data Into Tokenizer tknz.Read(recvb, recvn); // Parse Data while (tknz.MoveToNext()) { try { ParseMessageAndRaiseEvents(tknz.Buffer(), tknz.Length); } catch (System.Exception ex) { string BadMessage = ByteArrayToStringClean(tknz.Buffer(), tknz.Length); string msg = string.Format("Exception in MDWrapper Parsing Message, Ex = {0}, Msg = {1}", ex.Message, BadMessage); OnError(msg); } } } And I kept seeing occasional errors in my parsing function indicating that the message wasn't valid. At first, I thought that my tokenizer class was broken. But after logging all the incoming bytes to the tokenizer, it turns out that the raw bytes in recvb weren't a valid message. I didn't think that corrupted data like this was possible with a tcp data stream. I figured it had to be some type of buffer overflow so I set sock.ReceiveBufferSize = 1024 * 1024 * 8; and the parsing error never, ever occurs in testing (it happens often enough to replicate if I don't change the ReceiveBufferSize). But my question is: why wasn't I seeing an exception or an error state or something if the socket's internal buffer was overflowing before I changed this buffer size?

    Read the article

  • .NET Sockets Buffer Overflow No Error

    - by Michael Covelli
    I have one thread that is receiving data over a socket like this: while (sock.Connected) { // Receive Data (Block if no data) recvn = sock.Receive(recvb, 0, rlen, SocketFlags.None, out serr); if (recvn <= 0 || sock == null || !sock.Connected) { OnError("Error In Receive, recvn <= 0 || sock == null || !sock.Connected"); return; } else if (serr != SocketError.Success) { OnError("Error In Receive, serr = " + serr); return; } // Copy Data Into Tokenizer tknz.Read(recvb, recvn); // Parse Data while (tknz.MoveToNext()) { try { ParseMessageAndRaiseEvents(tknz.Buffer(), tknz.Length); } catch (System.Exception ex) { string BadMessage = ByteArrayToStringClean(tknz.Buffer(), tknz.Length); string msg = string.Format("Exception in MDWrapper Parsing Message, Ex = {0}, Msg = {1}", ex.Message, BadMessage); OnError(msg); } } } And I kept seeing occasional errors in my parsing function indicating that the message wasn't valid. At first, I thought that my tokenizer class was broken. But after logging all the incoming bytes to the tokenizer, it turns out that the raw bytes in recvb weren't a valid message. I didn't think that corrupted data like this was possible with a tcp data stream. I figured it had to be some type of buffer overflow so I set sock.ReceiveBufferSize = 1024 * 1024 * 8; and the parsing error never, ever occurs in testing (it happens often enough to replicate if I don't change the ReceiveBufferSize). But my question is: why wasn't I seeing an exception or an error state or something if the socket's internal buffer was overflowing before I changed this buffer size?

    Read the article

1 2 3  | Next Page >