Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 77/291 | < Previous Page | 73 74 75 76 77 78 79 80 81 82 83 84 | Next Page >

Parsing Wiki XML Dumps ver0.4 just got tough

- by syed

Hello, I am trying to parse Wikipedia XML Dump using "Parse-MediaWikiDump-1.0.4" along with "Wikiprep.pl" script. I guess this script works fine with ver0.3 Wiki XML Dumps but not with the latest ver0.4 Dumps. I get the following error. Can't locate object method "page" via package "Parse::MediaWikiDump::Pages" at wikiprep.pl line 390. Also, under the "Parse-MediaWikiDump-1.0.4" documentation @ http://search.cpan.org/~triddle/Parse-MediaWikiDump-1.0.4/lib/Parse/MediaWikiDump/Pages.pm, I read "LIMITATIONS Version 0.4 This class was updated to support version 0.4 dump files from a MediaWiki instance but it does not currently support any of the new information available in those files." Any work arounds would help me get to the next level. Note: one may wonder why cannot we directly use SAX or STAX parser instead, wikipedia dump is a 25GB plus single file, stack/memory issues are obvious. Hence, the above perl script resolves this issue but currently I am stuck with this version problem.

Read the article
Parsing srt subtitles

- by Vojtech R.

Hi, I want to parse srt subtitles: 1 00:00:12,815 --> 00:00:14,509 Chlapi, jak to jde s tema pracovníma svetlama?. 2 00:00:14,815 --> 00:00:16,498 Trochu je zesilujeme. 3 00:00:16,934 --> 00:00:17,814 Jo, sleduj. Every item into structure. With this regexs: A: RE_ITEM = re.compile(r'''(?P<index>\d+).(?P<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?P<end>\d{2}:\d{2}:\d{2},\d{3}).(?P<text>.*?)''', re.DOTALL) B: RE_ITEM = re.compile(r'''(?P<index>\d+).(?P<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?P<end>\d{2}:\d{2}:\d{2},\d{3}).(?P<text>.*)''', re.DOTALL) And this code: for i in Subtitles.RE_ITEM.finditer(text): result.append((i.group('index'), i.group('start'), i.group('end'), i.group('text'))) With code B I have only one item in array (because of greedy .*) and with code A I have empty 'text' because of no-greedy .*? How to cure this? Thanks

Read the article
Trouble parsing some RSS feeds using Java and Sax

- by brockoli

I've written an RSS feed parser in Java (running on Android) and it parses some feeds perfectly, and others not at all. I get the following error when it tries to parse Slashdot (http://rss.slashdot.org/Slashdot/slashdot) org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: unbound prefix If I try to parse Wired (http://feeds.wired.com/wired/index) org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: syntax error If I try to parse AndroidGuys (http://feeds.feedburner.com/androidguyscom) org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: syntax error Here is some code for my parser. public void updateArticles(Context ctx, Feed feed, int numDaysToGet) { try { targetFlag = TARGET_ARTICLES; tweetDB = new TweetMonsterDBAdapter(ctx); tweetDB.open(); currentFeed = feed; TimeZone.setDefault(TimeZone.getTimeZone("UTC")); // or "Etc/GMT-1" Date currentDate = new Date(); long dateInMillis = currentDate.getTime(); oldestDate.setTime(dateInMillis-(dayInMillis*numDaysToGet)); SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); XMLReader xr = sp.getXMLReader(); xr.setContentHandler(this); xr.parse(new InputSource(currentFeed.url.openStream())); } catch (IOException e) { Log.e("TweetMonster", e.toString()); } catch (SAXException e) { tweetDB.close(); Log.e("TweetMonster", e.toString()); } catch (ParserConfigurationException e) { Log.e("TweetMonster", e.toString()); } tweetDB.close(); } It doesn't even get into my startElement method.

Read the article
JASON parsing (soap based)

- by hardik

i have my soap message request i want to parse the data using Jason parser how i would do that ?

Read the article
C# Regular Expression for Regular Expression Parsing

- by Chris

I want to returns matches from a regular expression string. The regex string is: (?<TICKER>[A-Z]+)(?<SPACE>\\s)(?<MONTH_ALPHA_ABBREV>Jan|Feb|Mar|Apr|May|Jun|Jul|Sep|Oct|Nov|Dec)(?<SPACE>\\s)(?<DAY>\\d+)(?<SPACE>\\s)(?<YEAR_LONG>[2][0][0-9][0-9])(?<SPACE>\\s)(?<STRIKE_DOLLAR>\\d+(?=[.]))[.](?<STRIKE_DECIMAL>(?<=[.])\\d+)(?<SPACE>\\s)(?<PUTCALL_LONG>Call|Put) And I want to get matches for all of the group names and all of the items within square brackets (including the square brackets) outside of open and closed parenthesis. I have this regex: ((?<=[<])([A-Z]|[_])+(?=[>]))|(\\[.\\]) But this returns square bracket items within the parenthesis. To be more specific these are the matches I want from the regex at the top (keep in mind this needs to be flexible for any regex): TICKER SPACE MONTH_ALPHA_ABBREV SPACE DAY SPACE YEAR_LONG SPACE STRIKE_DOLLAR [.] STRIKE_DECIMAL SPACE PUTCALL_LONG

Read the article
ruby regex, parsing html

- by danwoods

Hello all, I'm trying to parse some returned html to look for currently playing movies. The pattern I'm trying to match looks like: <span dir=ltr>Clash of the Titans</span> Of which there are several in the returned html. (the html is huge, I've posted a sample at the bottom) I'm trying get an array of the movie titles with the following command: titles = listings_html.split(/(<span dir=ltr>).*(<\/span>)/) But I'm not getting the results I'm expecting. Can anyone see a problem with my approach or regex? Returned html (I believe the 'markdown'formating will render the some of the html, but this is just an example): <script>window.gbar={};(function(){function h(a,b,d){var c="on"+b;if(a.addEventListener)a.addEventListener(b,d,false);else if(a.attachEvent)a.attachEvent(c,d);else{var f=a[c];a[c]=function(){var e=f.apply(this,arguments),g=d.apply(this,arguments);return e==undefined?g:g==undefined?e:g&&e}}};var i=window.gbar,k,l,m;function n(a){var b=window.encodeURIComponent&&(document.forms[0].q||"").value;if(b)a.href=a.href.replace(/([?&])q=[^&]*|$/,function(d,c){return(c||"&")+"q="+encodeURIComponent(b)})}i.qs=n;function o(a,b,d,c,f,e){var g=document.getElementById(a);if(g){var j=g.style;j.left=c?"auto":b+"px";j.right=c?b+"px":"auto";j.top=d+"px";j.visibility=l?"hidden":"visible";if(f&&e){j.width=f+"px";j.height=e+"px"}else{o(k,b,d,c,g.offsetWidth,g.offsetHeight);l=l?"":a}}}i.tg=function(a){a=a||window.event;var b,d=a.target||a.srcElement;a.cancelBubble=true;if(k!=null)p(d);else{b=document.createElement(Array.every||window.createPopup?"iframe":"div");b.frameBorder="0";k=b.id="gbs";b.src="javascript:''";d.parentNode.appendChild(b);h(document,"click",i.close);p(d);i.alld&&i.alld(function(){var c=document.getElementById("gbli");if(c){var f=c.parentNode;q(f,c);var e=c.prevSibling;f.removeChild(c);i.removeExtraDelimiters(f,e);b.style.height=f.offsetHeight+"px"}})}};function r(a){var b,d=document.defaultView;if(d&&d.getComputedStyle){if(a=d.getComputedStyle(a,""))b=a.direction}else b=a.currentStyle?a.currentStyle.direction:a.style.direction;return b=="rtl"}function p(a){var b=0;if(a.className!="gb3")a=a.parentNode;var d=a.getAttribute("aria-owns")||"gbi",c=a.offsetWidth,f=a.offsetTop>20?46:24,e=false;do b+=a.offsetLeft||0;while(a=a.offsetParent);a=(document.documentElement.clientWidth||document.body.clientWidth)-b-c;c=r(document.body);if(d=="gbi"){var g=document.getElementById("gbi");q(g,document.getElementById("gbli")||g.firstChild);if(c){b=a;e=true}}else if(!c){b=a;e=true}l!=d&&i.close();o(d,b,f,e)}i.close=function(){l&&o(l,0,0)};function s(a,b,d){if(!m){m="gb2";if(i.alld){var c=i.findClassName(a);if(c)m=c}}a.insertBefore(b,d).className=m}function q(a,b){for(var d,c=window.navExtra;c&&(d=c.pop());)s(a,d,b)}i.addLink=function(a,b,d){if((b=document.getElementById(b))&&a){a.className="gb4";var c=document.createElement("span");c.appendChild(a);c.appendChild(document.createTextNode(" | "));c.id=d;b.appendChild(c)}}})();if(!window.google)window.google={};if(!window.google.movies)window.google.movies={};window.google.movies.registerFixdir=function(){var c="[\u0000- !-@[-{-\u00bf\u00d7\u00f7\u02b9-\u02ff\u2000-\u2bff]",g=new RegExp("^"+c+"([0-9]"+c+"$|[A-Za-z\u00c0-\u00d6\u00d8-\u00f6\u00f8-\u02b8\u0300-\u0590\u0800-\u1fff\u2c00-\ufb1c\ufdfe-\ufe6f\ufefd-\uffff])"),h=new RegExp("^"+c+"$");function e(d,a){if(!a)a=d&&d.target?d.target:window.event.srcElement;a.dir=g.test(a.value)?"ltr":(h.test(a.value)?"":"rtl")} var i=[document.getElementsByName("q")[0],document.getElementById("mtq")];for(var f=0,b;b=i[f];f++)if(b){b.onkeyup=e;e(null,b)}}; Movie Showtimes - Google Search.fl:link{}a:link,.w,a.w:link,.w a:link{color:#00c}a:visited{color:#551a8b}a:active{color:red}.t a:link,.t a:active,.t a:visited,.t{color:#000}.left{width:12em}.box{background:#fff}.nopadding{padding:0}.k{background:#36c}.z{display:none}.x{width:3em}.y{width:23em}.b{color:#00c;font-size:12pt;font-weight:bold}.i,.i:link{color:#a90a08}.n a{color:#000;font-size:10pt}.n .b a{color:#00c}.n .i{font-size:10pt;font-weight:bold}.h{cursor:pointer}body{background:#fff;font:82% Arial,Helvetica,sans-serif;margin:3px 0 0;padding:0}table{border-collapse:collapse;border-spacing:0}img{border:0}td,th{vertical-align:top}h1,h2{font-size:100%;margin:0}a{color:#00c}/ CSS for page */#title_bar{background:#f0f7f9;border-top:1px solid #6b90da;padding-bottom:4px;padding-left:8px;padding-top:4px}#google_bar{margin:3px 10px}#search_form{margin:3px 10px}#left_nav{border-right:1px solid #c9d7f1;margin-top:11px;position:absolute;left:9px;width:13.4em}#left_nav .section{margin-bottom:1.2em}.hidden{visibility:hidden}#results{height:auto !important;height:350px;margin-left:15em;min-height:350px;min-width:800px;width:expression(document.body.clientWidth<1000?"800px":"99.9%")}.name{font-size:124%;margin:0}.times{clear:both;margin:0}.address{margin:0}#movie_results{overflow:auto}.movie_results{margin-top:11px}.movie{clear:both;margin-bottom:40px}.movie .header{padding-left:8px}.movie .img{border:1px solid #ccc;float:left;margin-bottom:10px}.movie .desc{margin-bottom:15px;max-width:42em}.movie h2{font-size:124%;margin-bottom:2px}.movie .info{margin-bottom:10px}.movie .syn{margin-bottom:10px}.movie .section_title{background:#f0f7f9;clear:both;font-size:108%;margin-bottom:11px;margin-top:11px;padding-bottom:4px;padding-left:8px;padding-top:5px}.movie .showtimes{margin-bottom:8px;padding-left:8px}.movie .show_left{width:49%}.movie .show_right{width:49%}.movie .theater{padding-bottom:15px}.theater{clear:both;padding-bottom:1px}.theater_after_icon{padding-left:25px}.theater .show_left{width:49%}.theater .show_right{width:49%}.theater h2{font-size:124%;margin-bottom:2px}.theater .icon{float:left;height:3em;margin-right:5px}.theater .closure{font-size:100%}.theater .info{font-size:100%;padding-bottom:5px;padding-top:5px}.theater .movie{margin-bottom:8px;margin-right:8px;max-width:42em}.theater .movie .desc{margin-bottom:5px;margin-left:0}.theater .movie .info{margin-top:0}.theater .showtimes{margin-bottom:40px;margin-top:8px}#theater_map{right:0;left:0;position:relative;top:0}#theater_static_map{border:1px solid #c9d7f1;margin:10px}.map_marker .name{margin-top:10px}.photo{border:1px solid #ccc;margin-bottom:20px;margin-left:8px}.show_left{float:left;margin:0;width:49.999%}.show_right{float:right;margin:0;width:50%}.show_more{clear:both;font-size:124%;margin:0}.show_more a{color:#77c}.reviews{margin-bottom:8px;padding-left:8px}.review{margin-bottom:5px}.review .publisher{color:green}.review .date{color:#6f6f6f}.trailer{margin-bottom:8px;padding-left:8px}.clear{clear:both}.iconA{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 0}.iconB{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -38px}.iconC{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -76px}.iconD{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -114px}.iconE{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -152px}.iconF{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -190px}.iconG{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -228px}.iconH{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -266px}.iconI{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -304px}.iconJ{background:url(http://maps.gstatic.com/mapfiles/red_icons_A_J.png) repeat 0 -342px}.iconK{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 0}.iconL{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -38px}.iconM{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -76px}.iconN{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -114px}.iconO{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -152px}.iconP{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -190px}.iconQ{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -228px}.iconR{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -266px}.iconS{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -304px}.iconT{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -342px}.iconU{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -380px}.iconV{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -418px}.iconW{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -456px}.iconX{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -494px}.iconY{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -532px}.iconZ{background:url(http://maps.gstatic.com/mapfiles/red_icons_K_Z.png) repeat 0 -570px}#gbar,#guser{font-size:13px;padding-top:1px !important}#gbar{float:left;height:22px}#guser{padding-bottom:7px !important;text-align:right}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}#gbs,.gbm{background:#fff;left:0;position:absolute;text-align:left;visibility:hidden;z-index:1000}.gbm{border:1px solid;border-color:#c9d7f1 #36c #36c #a2bae7;z-index:1001}.gb1{margin-right:.5em}.gb1,.gb3{zoom:1}.gb2{display:block;padding:.2em .5em;}.gb2,.gb3{text-decoration:none;border-bottom:none}a.gb1,a.gb2,a.gb3,a.gb4{color:#00c !important}.gbi .gb3,.gbi .gb2,.gbi .gb4{color:#dd8e27 !important}.gbf .gb3,.gbf .gb2,.gbf .gb4{color:#900 !important}a.gb2:hover{background:#36c;color:#fff !important}Web Images Videos Maps News Shopping Gmail more ▼Books Finance Translate Scholar Blogs YouTube Calendar Photos Documents Reader Sites Groups even more » [email protected] | Google Account settings | Sign out Advanced Search PreferencesShowtimes for Murfreesboro, TN 37130Change Location› Today › Tomorrow › Monday › Tuesday› Theaters › Movies› Show list view › Show map viewPremiere 6 Theater810 Northwest Broad Street, Murfreesboro, TN - (615) 896-4100Clash of the Titans? - 1hr 50min?? - Rated PG-13?? - Action/Adventure? - Trailer - IMDb2:10 4:15 6:15 8:20 10:25pmDiary of a Wimpy Kid? - 1hr 33min?? - Rated PG?? - Comedy/Drama? - Trailer - IMDb2:00 3:50 6:00 7:50 9:40pmHow to Train Your Dragon?1hr 38min?? - Rated PG?? - Family/Animation? - IMDb2:00 3:55 6:00 7:55 9:50pmThe Bounty Hunter? - 1hr 46min?? - Rated PG-13?? - Action/Adventure/Comedy/Romance? - Trailer - IMDb2:15 4:15 6:25 8:25 10:30pmThe Last Song? - 1hr 47min?? - Rated PG?? - Drama? - Trailer - IMDb2:20 4:15 6:30 8:35 10:35pmTyler Perry's Why Did I Get Married Too?2hr 1min?? - Rated PG-13?? - Comedy?2:20 4:35 7:30 9:45pmContinental Cinema 5450 US Highway 231 N, Troy, AL - (334) 808-4225Clash of the Titans 3D? - 1hr 50min?? - Rated PG-13?? - Action/Adventure? - IMDb1:00 4:00 7:00 9:30pmHow to Train Your Dragon 3D? - 1hr 38min?? - Rated PG?? - Family/Animation? - IMDb1:05 4:05 7:05 9:25pmThe Bounty Hunter? - 1hr 46min?? - Rated PG-13?? - Action/Adventure/Comedy/Romance? - Trailer - IMDb1:00 4:00 7:00 9:30pmThe Last Song? - 1hr 47min?? - Rated PG?? - Drama? - Trailer - IMDb1:05 4:05 7:05 9:25pmTyler Perry's Why Did I Get Married Too?2hr 1min?? - Rated PG-13?? - Comedy?12:55 3:55 6:55 9:35pmMall Cinema - Hartford KYUS Hwy 231 South 62 East, Hartford, KY - (270) 298-3315Clash of the Titans? - 1hr 50min?? - Rated PG-13?? - Action/Adventure? - Trailer - IMDb5:00 7:00 9:00pmHow to Train Your Dragon?1hr 38min?? - Rated PG?? - Family/Animation? - IMDb5:00 7:00 9:00pmCarmike Wynnsong 16 - Murfreesboro2626 Cason Square Boulevard, Murfreesboro, TN - (615) 893-2253The Last Song? - 1hr 47min?? - Rated PG?? - Drama? - Trailer - IMDb12:15 1:00 2:45 4:00 5:15 7:00 7:45

Read the article
Simple Xml list parsing problem

- by Hubidubi

I have this xml: <root> <fruitlist> <apple>4</apple> <apple>5</apple> <orange>2</orange> <orange>6</orange> </fruitlist> </root> I'm writing a parser class, although I can't figure out how to deal with multiple node types. I can easily parse a list that contains only one node type (eg. just apples, not oranges) @ElementList(name = "fruitlist") private List<Apple> exercises; with more than one node type it also wants so parse non Apple nodes which doesn't work. I also tried to make another list for oranges, but it doen't work, I can't use fruitlist name more than once. @ElementList(name = "fruitlist", entry = "orange") private List<Orange> exercises; The ideal would be two seperate list for both node types. Hubi EDIT: After more searching & fiddling this question is a duplicate: Inheritance with Simple XML Framework

Read the article
Parsing unicode XML with Python SAX on App Engine

- by Derek Dahmer

I'm using xml.sax with unicode strings of XML as input, originally entered in from a web form. On my local machine (python 2.5, using the default xmlreader expat, running through app engine), it works fine. However, the exact same code and input strings on production app engine servers fail with "not well-formed". For example, it happens with the code below: from xml import sax class MyHandler(sax.ContentHandler): pass handler = MyHandler() # Both of these unicode strings return 'not well-formed' # on app engine, but work locally xml.parseString(u"<a>b</a>",handler) xml.parseString(u"<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler) # Both of these work, but output unicode xml.parseString("<a>b</a>",handler) xml.parseString("<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler) resulting in the error: File "<string>", line 1, in <module> File "/base/python_dist/lib/python2.5/xml/sax/__init__.py", line 49, in parseString parser.parse(inpsrc) File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/base/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/base/python_dist/lib/python2.5/xml/sax/handler.py", line 38, in fatalError raise exception SAXParseException: <unknown>:1:1: not well-formed (invalid token) Any reason why app engine's parser, which also uses python2.5 and expat, would fail when inputting unicode?

Read the article
Parsing RFC 2822 date in JAVA

- by DutrowLLC

I need to parse an RFC 2822 string representation of a date in Java. An example string is here: Sat, 13 Mar 2010 11:29:05 -0800 It looks pretty nasty so I wanted to make sure I was doing everything right and would run into weird problems later with the date being interpreted wrong either through AM-PM/Military time problems, UTC time problems, problems I don't anticipate, etc... Thanks!

Read the article
Parsing XML using xml.etree.cElementTree

- by Andre

I have the following XML in a string named 'xml': <?xml version="1.0" encoding="ISO-8859-1"?> <Book> <Page> <Text>Blah</Text> </Page> </Book> I'm trying to get the value Blah out of it but I'm having trouble with xml.etree.cElementTree. I've tried the find() and findtext() methods but nothing. Eventually I did this: import xml.etree.cElementTree as ET ... root = ET.fromstring(xml) element = root.getchildren()[0].getchildren()[0] Element now equals the element, which is what I want (for this solution anyway), but how do I get the inner text from it? element.text does not work. Any ideas? PS: I am using Python 2.5 atm. As an extra question: what is a better way to parse xml strings in python?

Read the article
Xml parsing in Objective c

- by Raju

NSString *myxml=@"<students> <student><name>Raju</name><age>25</age><address>abcd</address> </student></students>"; in iPhone programming how to parse this XML by each node. ....

Read the article
How to parse malformed HTML in python, using standard libraries

- by bukzor

There are so many html and xml libraries built into python, that it's hard to believe there's no support for real-world HTML parsing. I've found plenty of great third-party libraries for this task, but this question is about the python standard library. Requirements: Use only Python standard library components (I'm currently using v2.6) DOM support Handle HTML entities ( ) Handle partial documents (like: Hello, <iWorld</i!) Bonus points: XPATH support Handle unclosed/malformed tags. (<bigdoes anyone here know <html ???

Read the article
yaml-cpp parsing strings

- by Jason

Am i missing some thing or isnt it possible to parse YAML formatted strings with yaml-cpp? at least there isnt a YAML::Parser::Parser(std::string&) constructor. i get a yaml-string via libcurl from a http-server.

Read the article
problems with zend_pdf and right to left language and unicode?

- by user1400

hi all, i am using zend_pdf to create pdf files for Ritgh to left language as arabic or east asia langauges i use of this code $pdfDoc = new Zend_Pdf(); $pdfPage = $pdfDoc->newPage(Zend_Pdf_Page::SIZE_A4); $font = Zend_Pdf_Font::fontWithPath(APPLICATION_PATH.'/Fonts/arial.ttf'); $pdfPage->setFont($font, 36); $pdfDoc->pages[] = $pdfPage; $unicodeString = '????'; $pdfPage->drawText($unicodeString, 72, 720, 'UTF-8'); $pdfDoc->save('utf8.pdf'); but there are two problem 1- how can i change document's direction to right to left ? 2- characters are separated one by one. but in some laguages characters of a word are joint together. any idea? or i may use tcpdf? thanks

Read the article
Extracting and Parsing data with soapUI

- by James G

Hi. So I am in need to learn how to use soapUI pretty quick. I'm finding it pretty tedious to start so I was hoping I might be able to get some help here. Here's what I need to do. Lets say we have Company A and Company B which is a subset of Company B. Now Company A offers a webservice accessible by Company B such that Company B can gather daily aggregated data from Company A's database. Now Company B wants to take this data and publish it on their website. What I'd like is a very basic overview of what I need to do to extract and parse the data onto a website. Just the outline of the process so I can get started. What languages should I be using at what stages and what not. Any help would be highly appreciated.

Read the article
Parsing string, with Boost Spirit 2, to fill data in user defined struct

- by Surya

I'm using Boost.Spirit which was distributed with Boost-1.42.0 with VS2005. My problem is like this. I've this string which was delimted with commas. The first 3 fields of it are strings and rest are numbers. like this. String1,String2,String3,12.0,12.1,13.0,13.1,12.4 My rule is like this qi::rule<string::iterator, qi::skip_type> stringrule = *(char_ - ',') qi::rule<string::iterator, qi::skip_type> myrule= repeat(3)[*(char_ - ',') >> ','] >> (double_ % ',') ; I'm trying to store the data in a structure like this. struct MyStruct { vector<string> stringVector ; vector<double> doubleVector ; } ; MyStruct var ; I've wrapped it in BOOST_FUSION_ADAPT_STRUCTURE to use it with spirit. BOOST_FUSION_ADAPT_STRUCT (MyStruct, (vector<string>, stringVector) (vector<double>, doubleVector)) My parse function parses the line and returns true and after qi::phrase_parse (iterBegin, iterEnd, myrule, boost::spirit::ascii::space, var) ; I'm expecting var.stringVector and var.doubleVector are properly filled. but it is not the case. What is going wrong ? Thanks in advance, Surya

Read the article
Parsing plain Win32 PE File (Exe/DLL) in C#.NET

- by Usman

Hello, I need to parse plain Win32 DLL/Exe and need to get all imports and exports from it and to show it on console or GUI(say Win Forms). Is it possible to parse Win32 DLL/Exe in C#.NET, read its export table,import table and get managed types from it. As its unmanaged PE(.NET doesn't allows you to convert unmanaged PE files to managed .NET assemblies, only it generates COM managed assemblies). So how to parse export and import tables of PE files and take all methods(signatures from it) in managed form.(e.g if char* as argument, it should display as IntPtr) Regards Usman

Read the article
FasterCSV Parsing issue?

- by Schroedinger

G'day guys, I'm currently using fastercsv to construct ActiveRecord elements and I can't for the life of me see this bug (tired), but for some reason when it creates, if in the rake file i output the column I want to save as the element value, it puts out correctly, as either a Trade or a Quote but when I try to save it into the activerecord, it won't work. FasterCSV.foreach("input.csv", :headers => true) do |row| d = DateTime.parse(row[1]+" "+row[2]) offset = Rational(row[3].to_i,24) o = d.new_offset(offset) t = Trade.create( :name => row[0], :type => row[4], :time => o, :price => row[6].to_f, :volume => row[7].to_i, :bidprice => row[10].to_f, :bidsize => row[11].to_i, :askprice => row[14].to_f, :asksize => row[15].to_i ) end Ideas? Name and Type are both strings, every other value works except for type. Have I missed something really simple?

Read the article
NULL value when using NSDateFormatter after setting NSDate property via XML parsing

- by David A Gibson

Hello, I am using the following code to try and display in a time in a table cell. TimeSlot *timeSlot = [timeSlots objectAtIndex:indexPath.row]; NSDateFormatter *timeFormat = [[NSDateFormatter alloc] init]; [timeFormat setDateFormat:@"HH:mm:ss"]; NSLog(@"Time: %@", timeSlot.time); NSDate *mydate = timeSlot.time; NSLog(@"Time: %@", mydate); NSString *theTime = [timeFormat stringFromDate:mydate]; NSLog(@"Time: %@", theTime); The log output is this: 2010-04-14 10:23:54.626 MyApp[1080:207] Time: 2010-04-14T10:23:54 2010-04-14 10:23:54.627 MyApp[1080:207] Time: 2010-04-14T10:23:54 2010-04-14 10:23:54.627 MyApp[1080:207] Time: (null) I am new to developing for the iPhone and as it all compiles with no errors or warnings I am at a loss as to why I am getting NULL in the log. Is there anything wrong with this code? Thanks Further Info I used the code exactly from your answer lugte098 just to check and I was getting dates which leads me to believe that my TimeSlot class can't have a date correctly set in it's NSDate property. So my question becomes - how from XML do I set a NSDate property? I have this code (abbreviated): -(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *) string { if ([currentElement isEqualToString:@"Time"]) { currentTimeSlot.time = string } } Thanks

Read the article
String parsing help

- by Click Upvote

I have a string like the following: $string = " <paragraph>apples are red...</paragraph> <paragraph>john is a boy..</paragraph> <paragraph>this is dummy text......</paragraph> "; I would like to split this string into an array contanining the text found between the <paragraph></paragraph> tags. E.g something like this: $string = " <paragraph>apples are red...</paragraph> <paragraph>john is a boy..</paragraph> <paragraph>this is dummy text......</paragraph> "; $paragraphs = splitParagraphs($string); /* $paragraphs now contains: $paragraphs[0] = apples are red... $paragraphs[1] = john is a boy... $paragraphs[1] = this is dummy text... */ Any ideas? P.S it should be case insensitive, <paragraph>, <PARAGRAPH>, <Paragraph> should all be treated the same way.

Read the article
How to parse text fragments located after tag by simplehtmldom?

- by moogeek

Hello! I'm using simplehtmldom to parse html and i'm stuck in parsing plaintext that is located outside of any tag (but between two different tags): <div class="text_small"> <b>?dress:</b> 7 Hange Road<br> <b>Phone:</b> 415641587484<br> <b>Contact:</b> Alex<br> <b>Meeting Time:</b> 12:00-13:00<br> </div> Is it possible to get this values of Adress, Phone, Contact, Meeting Time? I wonder if there is a opportunity to pass CSS Selectors into nextSibling/nextSibling functions...

Read the article
Extracting information from PDFs of research papers

- by Christopher Gutteridge

I need a mechanism for extracting bibliographic metadata from PDF documents, to save people entering it by hand or cut-and-pasting it. At the very least, the title and abstract. The list of authors and their affiliations would be good. Extracting out the references would be amazing. Ideally this would be an open source solution. The problem is that not all PDF's encode the text, and many which do fail to preserve the logical order of the text, so just doing pdf2text gives you line 1 of column 1, line 1 of column 2, line 2 of column 1 etc. I know there's a lot of libraries. It's identifying the abstract, title authors etc. on the document that I need to solve. This is never going to be possible every time, but 80% would save a lot of human effort.

Read the article
Parsing an RFC822-Datetime in .NETMF 4.0

- by chris12892

I have an application written in .NETMF that requires that I be able to parse an RFC822-Datetime. Normally, this would be easy, but NETMF does not have a DateTime.parse() method, nor does it have some sort of a pattern matching implementation, so I'm pretty much stuck. Any ideas? EDIT: "Intelligent" solutions are probably needed. Part of the reason this is difficult is that the datetime in question has a tendency to have extra spaces in it (but only sometimes). A simple substring solution might work one day, but fail the next when the datetime has an extra space somewhere between the parts. I do not have control over the datetime, it is from the NOAA.

Read the article
Parsing JSON into XML using Windows Phone

- by Henry Edwards

I have this code, but can't get it all working. I am trying to get a json string into xml. So that I can get a list of items when i parse the data. Is there a better way to parse json into xml. If so what's the best way to do it, and if possible could you give me a working example? The URL that is in the code is not the URL that i am using using System; using System.Collections.Generic; using System.Linq; using System.Net; using System.Windows; using System.Windows.Controls; using System.Windows.Documents; using System.Windows.Input; using System.Windows.Media; using System.Windows.Media.Animation; using System.Windows.Shapes; using Microsoft.Phone.Controls; using Newtonsoft.Json; using Newtonsoft.Json.Serialization; using Newtonsoft.Json.Converters; using Newtonsoft.Json.Utilities; using Newtonsoft.Json.Linq; using Newtonsoft.Json.Schema; using Newtonsoft.Json.Bson; using System.Xml; using System.Xml.Serialization; using System.Xml.Linq; using System.Xml.Linq.XDocument; using System.IO; namespace WindowsPhonePanoramaApplication3 { public partial class Page2 : PhoneApplicationPage { public Page2() { InitializeComponent(); } private void Form1_Load(object sender, EventArgs e1) { /* because the origional JSON string has multiple root's this needs to be added */ string json = "{BFBC2_GlobalStats:"; json += DownlodUrl("http://api.bfbcs.com/api/xbox360?globalstats"); json += "}"; XmlDocument doc = (XmlDocument)JsonConvert.DeserializeObject(json); textBox1.Text = GetXmlString(doc); } private string GetXmlString() { throw new NotImplementedException(); } private string DownlodUrl(string url) { string result = null; try { WebClient client = new WebClient(); result = client.DownloadString(url); } catch (Exception ex) { // handle error result = ex.Message; } return result; } private string GetXmlString(XmlDocument xmlDoc) { sw = new StringWriter(); XmlTextWriter xw = new XmlTextWriter(sw); xw.Formatting = System.Xml.Formatting.Indented; xmlDoc.WriteTo(xw); return sw.ToString(); } } } The URL outputs the following code: {"StopName":"Race Hill", "stopId":7553, "NaptanCode":"bridwja", "LongName":"Race Hill", "OperatorsCode1":" 5", "OperatorsCode2":" ", "OperatorsCode3":" ", "OperatorsCode4":"bridwja", "Departures":[ { "ServiceName":"", "Destination":"", "DepartureTimeAsString":"", "DepartureTime":"30/01/2012 00:00:00", "Notes":""}` Thanks for your responses. So Should i just leave the data a json and then view the data via that??? Is this a way to show the data from a json string. public void Load() { // form the URI UriBuilder uri = new UriBuilder("http://mysite.com/events.json"); WebClient proxy = new WebClient(); proxy.OpenReadCompleted += new OpenReadCompletedEventHandler(OnReadCompleted); proxy.OpenReadAsync(uri.Uri); } void OnReadCompleted(object sender, OpenReadCompletedEventArgs e) { if (e.Error == null) { var serializer = new DataContractJsonSerializer(typeof(EventList)); var events = (EventList)serializer.ReadObject(e.Result); foreach (var ev in events) { Items.Add(ev); } } } public ObservableCollection<EventDetails> Items { get; private set; } Edit: Have now kept the url as json and have now got it working by using the json way.

Read the article
parsing html pages from tcl

- by mithunmo

Hello , I using tdom version 0.8.2 to parse html pages. From the help pages I found the following commands to get the ElementById TCL code set html {<html> <head> </head> <body> <div id="m"> </div> </body> </html> } package require tdom set doc [ dom parse -html $html ] set node [ $doc getElementById m] But when I execute the second set command I get a empty string . But cleary the tag has an id of m . Can someone tell where am I going wrong ? Regards, Mithun

Read the article

< Previous Page | 73 74 75 76 77 78 79 80 81 82 83 84 | Next Page >