Search Results

Search found 37381 results on 1496 pages for 'string parsing'.

Page 85/1496 | < Previous Page | 81 82 83 84 85 86 87 88 89 90 91 92 | Next Page >

PHP: What is an efficient way to parse a text file containing very long lines?

- by Shaun

I'm working on a parser in php which is designed to extract MySQL records out of a text file. A particular line might begin with a string corresponding to which table the records (rows) need to be inserted into, followed by the records themselves. The records are delimited by a backslash and the fields (columns) are separated by commas. For the sake of simplicity, let's assume that we have a table representing people in our database, with fields being First Name, Last Name, and Occupation. Thus, one line of the file might be as follows [People] = "\Han,Solo,Smuggler\Luke,Skywalker,Jedi..." Where the ellipses (...) could be additional people. One straightforward approach might be to use fgets() to extract a line from the file, and use preg_match() to extract the table name, records, and fields from that line. However, let's suppose that we have an awful lot of Star Wars characters to track. So many, in fact, that this line ends up being 200,000+ characters/bytes long. In such a case, taking the above approach to extract the database information seems a bit inefficient. You have to first read hundreds of thousands of characters into memory, then read back over those same characters to find regex matches. Is there a way, similar to the Java String next(String pattern) method of the Scanner class constructed using a file, that allows you to match patterns in-line while scanning through the file? The idea is that you don't have to scan through the same text twice (to read it from the file into a string, and then to match patterns) or store the text redundantly in memory (in both the file line string and the matched patterns). Would this even yield a significant increase in performance? It's hard to tell exactly what PHP or Java are doing behind the scenes.

Read the article
xerces-c: Xml parsing multiple files

- by user459811

I'm atempting to learn xerces-c and was following this tutorial online. http://www.yolinux.com/TUTORIALS/XML-Xerces-C.html I was able to get the tutorial to compile and run through a memory checker (valgrind) with no problems however when I made alterations to the program slightly, the memory checker returned some potential leak bytes. I only added a few extra lines to main to allow the program to read two files instead of one. int main() { string configFile="sample.xml"; // stat file. Get ambigious segfault otherwise. GetConfig appConfig; appConfig.readConfigFile(configFile); cout << "Application option A=" << appConfig.getOptionA() << endl; cout << "Application option B=" << appConfig.getOptionB() << endl; // Added code configFile = "sample1.xml"; appConfig.readConfigFile(configFile); cout << "Application option A=" << appConfig.getOptionA() << endl; cout << "Application option B=" << appConfig.getOptionB() << endl; return 0; } I was wondering why is it when I added the extra lines of code to read in another xml file, it would result in the following output? ==776== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info ==776== Command: ./a.out ==776== Application option A=10 Application option B=24 Application option A=30 Application option B=40 ==776== ==776== HEAP SUMMARY: ==776== in use at exit: 6 bytes in 2 blocks ==776== total heap usage: 4,031 allocs, 4,029 frees, 1,092,045 bytes allocated ==776== ==776== 3 bytes in 1 blocks are definitely lost in loss record 1 of 2 ==776== at 0x4C28B8C: operator new(unsigned long) (vg_replace_malloc.c:261) ==776== by 0x5225E9B: xercesc_3_1::MemoryManagerImpl::allocate(unsigned long) (MemoryManagerImpl.cpp:40) ==776== by 0x53006C8: xercesc_3_1::IconvGNULCPTranscoder::transcode(unsigned short const*, xercesc_3_1::MemoryManager*) (IconvGNUTransService.cpp:751) ==776== by 0x4038E7: GetConfig::readConfigFile(std::string&) (in /home/bonniehan/workspace/test/a.out) ==776== by 0x403B13: main (in /home/bonniehan/workspace/test/a.out) ==776== ==776== 3 bytes in 1 blocks are definitely lost in loss record 2 of 2 ==776== at 0x4C28B8C: operator new(unsigned long) (vg_replace_malloc.c:261) ==776== by 0x5225E9B: xercesc_3_1::MemoryManagerImpl::allocate(unsigned long) (MemoryManagerImpl.cpp:40) ==776== by 0x53006C8: xercesc_3_1::IconvGNULCPTranscoder::transcode(unsigned short const*, xercesc_3_1::MemoryManager*) (IconvGNUTransService.cpp:751) ==776== by 0x40393F: GetConfig::readConfigFile(std::string&) (in /home/bonniehan/workspace/test/a.out) ==776== by 0x403B13: main (in /home/bonniehan/workspace/test/a.out) ==776== ==776== LEAK SUMMARY: ==776== definitely lost: 6 bytes in 2 blocks ==776== indirectly lost: 0 bytes in 0 blocks ==776== possibly lost: 0 bytes in 0 blocks ==776== still reachable: 0 bytes in 0 blocks ==776== suppressed: 0 bytes in 0 blocks ==776== ==776== For counts of detected and suppressed errors, rerun with: -v ==776== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 2)

Read the article
How do I get bison/flex to restart scanning after something like token substitution?

- by chucknelson

Is there a way to force bison and/or flex to restart scanning after I replace some token with something else? My particular example would be with replacement for a specific word/string. If I want a word of hello to be replaced by echo hello, how can I get flex or bison to replace hello and then start parsing again (to pick up 2 words instead of just one). So it would be like: Get token WORD (which is a string type) If hello, replace token value with echo hello Restart parsing entire input (which is now echo hello) Get token WORD (echo) Get token WORD (hello) I've seen very tempting functions like yyrestart(), but I don't really understand what that function in particular really accomplishes. Any help is greatly appreciated, thanks!

Read the article
Calculating probability that a string has been randomized? - Python

- by RadiantHex

Hi folks, this is correlated to a question I asked earlier (question) I have a list of manually created strings such as: lucy87 gordan_king fancy_unicorn77 joplucky_kanga90 base_belong_to_narwhals and a list of randomized strings: johnkdf pancake90kgjd fancy_jagookfk manhattanljg What gives away that the last set of strings are randomized is that sequences such as 'kjg', 'jgf', 'lkd', ... . Any clever way I could separate strings that contain these apparently randomized strings from the crowd? I guess that this plays a lot on the fact that certain characters are more likely to be placed next to others (e.g. 'co', 'ka', 'ja', ...). Any ideas on this one? Kylotan mentioned Reverend, but I am not sure if it can be used fr such purpose. Help would be much appreciated!

Read the article
Parsing log files in a folder in ColdFusion

- by Simon Guo

The problem is there is a folder ./log/ containing the files like: jan2010.xml, feb2010.xml, mar2010.xml, jan2009.xml, feb2009.xml, mar2009.xml ... each xml file would like: <root><record name="bob" spend="20"></record>...(more records)</root> I want to write a piece of ColdFusion code (log.cfm) that simply parsing those xml files. For the front end I would let user to choose a year, then the click submit button. All the content in that year will be show up in separate table by month. Each table shows the total money spent for each person. like: person cost bob 200 mike 300 Total 500 Thanks.

Read the article
SimpleXML adding html into Hash tree

- by Miriam Raphael Roberts

Question: I have an xml file that I am pulling from the web and parsing. One of the items in the xml is a 'content' value that has HTML. I am using SimpleXML/XMLin to parse the file like so: $xml= eval { $data-XMLin($xmldata, forcearray = 1, suppressempty= +'') }; When I use Dumper to dump the hash, I dsicovered that SimpleXML is parsing the HTML into the hash tree. 'content' => { 'div' => [ { 'xmlns' => 'http://www.w3.org/1999/xhtml', 'p' => [ { 'a' => [ { 'href' => 'http://miamiherald.typepad.com/.a/6a00d83451b26169e20133ec6f4491970b-pi', 'style' => 'FLOAT: left', 'img' => [ etc..... This is not what I want. I want to just grab content inside of this entry. How do I do this?

Read the article
Java Conversion of byte[] into a srting and then back to a byte[]

- by Sid

I am working on a proxy server. I am getting data in byte[] which i convert into a string to perform certain operations. Now when i convert this new string back into a byte [] it causes unkonw problems. So mainly its like i need to know how to correctly convert a byte[] into a string and then back into a byte[] again. I tried to just convert the byte[] to string and then back to byte[] again (to make sure thats its not my operations that are causing problems). So its like: // where reply is a byte[] String str= new String(reply,0, bytesRead); streamToClient.write(str.getBytes(), 0, bytesRead); is not equivalent to streamToClient.write(reply, 0, bytesRead); my proxy works fine when i just send the byte[] without any conversion but when i convert it from byte[] to a string and then back to a byte[] its causes problems. can some one please help? =]

Read the article
Parsing getopts in bash

- by ABach

I've got a bash function that I'm trying to use getopts with and am having some trouble. The function is designed to be called by itself (getch), with an optional -s flag (getch -s), or with an optional string argument afterward (so getch master and getch -s master are both valid). The snippet below is where my problem lies - it isn't the entire function, but it's what I'm focusing on: getch() { if [ "$#" -gt 2 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then echo "Usage: $0 [-s] [branch-name]" >&2 return 1 fi while getopts "s" opt; do echo $opt # This line is here to test how many times we go through the loop case $opt in s) squash=true shift ;; *) ;; esac done } The getch -s master case is where the strangeness happens. The above should spit out s once, but instead, I get this: [user@host:git-repositories/temp]$ getch -s master s s [user@host:git-repositories/temp]$ Why is it parsing the -s opt twice?

Read the article
Parsing raw apache logs

- by MB34

I need some php code for parsing raw apache logs. In particular, I want the number of times mode=search and the term used for searching. Here is an example: 207.46.195.228 - - [30/Apr/2010:03:24:26 -0700] "GET /index.php?mode=search&term=AE1008787E0174 HTTP/1.1" 200 13047 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 212.81.200.167 - - [30/Apr/2010:04:21:43 -0700] "GET /index.php?mode=search&term=WH2002D-YYH HTTP/1.1" 200 12079 "http://www.mysite.com/SearchGBY.php?page=81" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6.4; .NET CLR 1.1.4322; .NET CLR 2.0.50727; WinuE v6; InfoPath.2; WinuE v6)" 212.81.200.167 - - [30/Apr/2010:04:21:44 -0700] "GET /file_uploads/banners/banner.swf HTTP/1.1" 200 50487 "-" "contype" 66.249.68.168 - - [30/Apr/2010:04:21:45 -0700] "GET /index.php?mode=search&term=WH2002D-YYH HTTP/1.1" 200 12079 "-" "Mediapartners-Google"

Read the article
Parsing html for domain links

- by Hallik

I have a script that parses an html page for all the links within it. I am getting all of them fine, but I have a list of domains I want to compare it against. So a sample list contains list=['www.domain.com', 'sub.domain.com'] But I may have a list of links that look like http://domain.com http://sub.domain.com/some/other/page I can strip off the http:// just fine, but in the two example links I just posted, they both should match. The first I would like to match against the www.domain.com, and the second, I would like to match against the subdomain in the list. Right now I am using url2lib for parsing the html. What are my options in completely this task?

Read the article
Flex: convert VideoPlayer.currentTime to string "00:00:00:000"

- by numediaweb

Hi there! what about this one: I want to format the currentTime displayed by a videoPlayer component inside flex, something like : 8230.999 to something like 01:59:59:999 which is "hours:minutes:seconds:milliseconds" I trie different sets of codes but they can't get it to work because currentTime is nor a correct miliseconds time as it adds a floating 3 digit point to seconds; so instead of : 2000ms it outputs 2.000 something people like me just can't understand! thanx for any help :) ### UPDATE I still have problem with milliseconds. here's the current MXML: <?xml version="1.0" encoding="utf-8"?> <s:Application xmlns:fx="http://ns.adobe.com/mxml/2009" xmlns:s="library://ns.adobe.com/flex/spark" xmlns:mx="library://ns.adobe.com/flex/mx" minWidth="955" minHeight="600"> <fx:Script> <![CDATA[ protected function convert_clickHandler(event:MouseEvent):void { var val:Number = new Number(inPut.text); //inPut.text = 1000.001 //val = val * 1000; outPut.text = timeFormat(val); } public static function timeFormat(value:Number):String { var milliseconds:Number = value % 1000; var seconds:Number = Math.floor((value/1000) % 60); var minutes:Number = Math.floor((value/60000) % 60); var hours:Number = Math.floor((value/3600000) % 24); var s_miliseconds:String = (milliseconds<10 ? "00" : (milliseconds<100 ? "0" : ""))+ String(milliseconds); var s_seconds:String = seconds < 10 ? "0" + String(seconds) : String(seconds); var s_minutes:String = minutes < 10 ? "0" + String(minutes) : String(minutes); var s_hours:String = hours < 10 ? "0" + String(hours) : String(hours); return s_hours + ":" + s_minutes + ":" + s_seconds + '.'+s_miliseconds; // returns 00:00:01.000.0009999999999763531 should return 00:00:01.001 // I still have problem with milliseconds } ]]> </fx:Script> <fx:Declarations>  </fx:Declarations> <s:TextInput x="240" y="72" id="inPut" text="1000.001"/> <s:TextInput x="240" y="140" id="outPut"/> <s:Button x="274" y="107" label="convert" id="convert" click="convert_clickHandler(event)"/> </s:Application>

Read the article
CDATA xml parsing extra greater than problem

- by Ruchir Shah

Hi, I am creating an xml using php and parsing that xml in iphone application code. In description field there is some html tags and text. I am using following line to convert this html tags in to xml tag using CDATA. $response .= '<desc><![CDATA['.trim($feed['fulltext']).']]></desc>'; Now, here my $feed['fulltext'] value is like this <span class="ABC">...text...</span> In xml I am getting following response, <desc><![CDATA[><span class"ABC">...text...</span>]]></desc> You can see here, I am getting an extra greater-than symbol just before the value of $feed['fulltext'] starts. (like this: ...text...) Any solution or suggestion for this? Thanks in advance. Cheers.

Read the article
How to change the behavior of string objects in web service calls via Windows Communication Foundati

- by Geri Langlois

I have third party api's which require string values to be submitted as empty strings. On an asp.net page I can use this code (abbreviated here) and it works fine: public class Customer { private string addr1 = ""; public string Addr1 { get {return addr1;} set {addr1 = value;} } private string addr2 = ""; public string Addr2 { get {return addr2;} set {addr2 = value;} } private string city = ""; public string City { get {return city;} set {city = value;} } } Customer cust = new Customer(); cust.Addr1 = "1 Main St."; cust.City = "Hartford"; int custno = CustomerController.InsertCustomer(cust); The Addr2 field, which was not initialized is still an empty string when inserted. However, using the same code but called it through a web service based on Windows Communication Foundation the Addr2 field is null. Is there a way (or setting) where all string fields, even if uninitialized, would return an empty string (unless, of course, a value was set).

Read the article
Format string using RegEx

- by user99322

String to be formatted "new Date(2009,0,1)" String after formatting "'01-Jan-2009'"

Read the article
How should I call the operation that limit a string's length?

- by egarcia

This is a language-agnostic question - unless you count English as a language. I've got this list of items which can have very long names. For aesthetic purposes, these names must be made shorter in some cases, adding dots (...) to indicate that the name is longer. So for example, if article.name returns this: lorem ipsum dolor sit amet I'd like to get this other output. lorem ipsum dolor ... I can program this quite easily. My question is: how should I call that shortening operation? I mean the name, not the implementation. Is there a standard English name for it?

Read the article
Error Downloading Metadata from ASMX Service

- by michael.lukatchik

It should be possible (and it looks like it is), but assume I have the following functions in my ASMX web service: [WebMethod(MessageName = "CreateExternalRpt1")] public bool CreateExternalRpt(int iProductId, int iOrderProductId, DateTime dtReportTime, string strReportTitle, string strReportCategory, string strReportPrintType, out string strSnapshot, string strApplicantFirst, string strApplicantLast, string strApplicantMiddle, string strOwnerLast, string strOwnerMiddle, string strOwnerFirst, bool blnCurrentReport, int iOrderId, bool blnFlag, string strCustomerId, string strUserId) { … } [WebMethod(MessageName = "CreateExternalRpt2")] public bool CreateExternalRpt(int iProductId, int iOrderProductId, DateTime dtReportTime, string strReportTitle, string strReportCategory, string strReportPrintType, string strReportSnapshot, string strApplicantFirst, string strApplicantLast, string strApplicantMiddle, string strOwnerLast, string strOwnerMiddle, string strOwnerFirst, bool blnCurrentReport, int iOrderId, bool blnFlag) { … } With both of these functions defined in my web service, my .NET client app can’t download the metadata and instead throws a generic error message “There was an error downloading…”. With one of the above methods removed, the .NET client app can successfully download the metadata. I’ve read that by decorating the WebMethod with a unique name, both functions should be exposed in the service metadata. This isn’t working. What am I missing?

Read the article
golang dynamically parsing files

- by Brian Voelker

For parsing files i have setup a variable for template.ParseFiles and i currently have to manually set each file. Two things: How would i be able to walk through a main folder and a multitude of subfolders and automatically add them to ParseFiles so i dont have to manually add each file individually? How would i be able to call a file with the same name in a subfolder because currently I get an error at runtime if i add same name file in ParseFiles. var templates = template.Must(template.ParseFiles( "index.html", // main file "subfolder/index.html" // subfolder with same filename errors on runtime "includes/header.html", "includes/footer.html", )) func main() { // Walk and ParseFiles filepath.Walk("files", func(path string, info os.FileInfo, err error) { if !info.IsDir() { // Add path to ParseFiles } return }) http.HandleFunc("/", home) http.ListenAndServe(":8080", nil) } func home(w http.ResponseWriter, r *http.Request) { render(w, "index.html") } func render(w http.ResponseWriter, tmpl string) { err := templates.ExecuteTemplate(w, tmpl, nil) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) } }

Read the article
How to get unicodes from Google translation output string.

- by user270885

In google translate web site if i type any word in English and select any other foreign language, it show the exact word in the foreign language. I want the unicode value of that foreign characters. How to get that?

Read the article
Defining tokens at runtime

- by Peter Crenshaw

I want to write a parser for EDIFACT messages with JavaCC. My problem is that I cannot define all terminal symbols before parsing a message because at the begining of each message there is a so called "Advice Segment" ("UNA" Segment) which defines things like element seperator symbol, escape symbol, segment terminator symbol and decimal notation (e.g. '.' or ','). So I think/guess the production rules need some kind of variables which must be set at runtime during parsing. Can this be done with JavaCC and if so how? Or is there another way I am missing?

Read the article
How to download a webpage in php

- by Hugo

Hello! I was wondering how I could download a webpage in php for parsing?

Read the article
How do I apply a string function to an array?

- by ggg

I was fortunate enough to receive this code (flips Lastname, Firstname) from an earlier post. $name = "Lastname, Firstname"; $names = explode(", ", $name); $name = $names[1] . " " . $names[0]; How do I apply the function to each value in an array that is in the form: $ginfo ->$(LastName, FirstName). I tried the code below, but it doesn't work. $name1 =($ginfo->White); $name1 = explode(", ", $name1); $FLw = $name1[1] . " " . $name1[0]; foreach ($name1 as ($ginfo->White)) {return($FLw);}

Read the article
Why is passing a string literal into a char* arguament only sometimes a compiler error?

- by Brian Postow

I'm working in a C, and C++ program. We used to be compiling without the make-strings-writable option. But that was getting a bunch of warnings, so I turned it off. Then I got a whole bunch of errors of the form "Cannot convert const char* to char* in argmuent 3 of function foo". So, I went through and made a whole lot of changes to fix those. However, today, the program CRASHED because the literal "" was getting passed into a function that was expecting a char*, and was setting the 0th character to 0. It wasn't doing anything bad, just trying to edit a constant, and crashing. My question is, why wasn't that a compiler error? In case it matters, this was on a mac compiled with gcc-4.0.

Read the article
How to compare string with float? in Objective C

- by David

How would you? I'm having problems. Thanks. I'm currently using if (myString == myfloat) { // do something but this won't work } OR if ([myString == myFloat]) { // do something but this won't work } Thanks!

Read the article
Parsing and validating arbitrary date formats in ruby (on rails)

- by Matt Briggs

I have a requirement to handle custom date formats in an existing app. The idea is that the users have to do with multiple formats from outside sources they have very little control over. We will need to be able to take the format and both validate Dates against it, as well as parse strings specifically in that format. The other thing is that these can be completely arbitrary, like JA == January, FE == February, etc... to my understanding, chronic only handles parsing (and does it in a more magical way then I can use), and enter code here DateTime#strptime comes close, but doesn't really handle the whole two character month scenario, even with custom formatters. The 'nuclear' option is to write in custom support for edge cases like this, but I would prefer to use a library if something like this exists.

Read the article
Video editing language

- by wvd

Hi folks, My next project will be all about language tools, parsing and such. Because of that reason I've decided to write a simple language which can be used for video editing. So instead of those desktop applications (Sony vegas, Adobe Premiere, ..) it's basically a language where you define the effects and all and it will generate a video for you. Since I've got no experience in this kind of business I need some help. The goal of the project is to create a simple language which is able to do some basic things (such as text fading in, etc). I am looking for articles/projects/blogs/whatever related with this which could help me writing this language. (Note that I don't need articles about language parsing since I'm pretty familar with that, just the video editing part). Thanks, William v. Doorn

Read the article

< Previous Page | 81 82 83 84 85 86 87 88 89 90 91 92 | Next Page >