Search Results

Search found 70351 results on 2815 pages for 'file parsing'.

Page 145/2815 | < Previous Page | 141 142 143 144 145 146 147 148 149 150 151 152  | Next Page >

  • How to parse ISO formatted date in python?

    - by Big 40wt Svetlyak
    I need to parse strings like that "2008-09-03T20:56:35.450686Z" into the python's datetime? I have found only strptime in the python 2.5 std lib, but it not so convinient. Which is the best way to do that? Update: It seems, that python-dateutil works very well. I have found that solution: d1 = '2008-09-03T20:56:35.450686Z' d2 = dateutil.parser.parse(d1) d3 = d2.astimezone(dateutil.tx.tzutc())

    Read the article

  • Regex for retrieving the parameter of the css url function...

    - by Kieron
    Hi, I'm trying to get the url portion of the following string: url(images/ui-bg_highlight-soft_75_cccccc_1x100.png) So the required part if images/ui-bg_highlight-soft_75_cccccc_1x100.png. Currently I've got this: url\((?<url>.*)\) But it seems to be choking on the following example: url(images/ui-bg_flat_0_aaaaaa_40x100.png) 50% 50% repeat-x; opacity: .30;filter:Alpha(Opacity=30) Which results in images/ui-bg_flat_0_aaaaaa_40x100.png) 50% 50% repeat-x; opacity: .30;filter:Alpha(Opacity=30... I'd like to make sure that it supports as many variations as possible (additional whitespace etc). Thanks! Kieron

    Read the article

  • .NET regex: Match.nextMatch() never returns

    - by Jimmy
    I have a regex that seems to have worked fine for the past year or so, and all of a sudden today with a new slightly different text to match against, Match.nextMatch() never returns. I'm no regex expert and I'm sure the regex can be optimized, but previous data sets weren't much more complex than what I've tried today. Furthermore, the regex works fine against the offending data set in a tool like RegexBuddy; it's only in .net (running in debug in Visual Studio) that it seems to hang. Nevertheless, if anyone can figure out how to tweak the regex to make it work, I'd really appreciate it. This is the regex: <tr>(<td[^>]*><a[^>]*>(?<callOptionTicker>[A-Z]{1,5}\d{6}C\d{8})</a></td>)(<td[^>]*>.*?</td>){6}(<td[^>]*><b><a[^>]*>(?<strikePrice>\d*\.\d*)</a></b></td>)(<td[^>]*><a[^>]*>(?<putOptionTicker>[A-Z]{1,5}\d{6}P\d{8})</a></td>) It's meant to extract put and call option tickers from a Yahoo option chain page (i.e., raw HTML). It works fine for IBM http://finance.yahoo.com/q/os?s=IBM&m=2010-05-21 It doesn't work for SPX options (this is the offending data set) http://finance.yahoo.com/q/os?s=I:SPX.W&m=2010-05

    Read the article

  • How do I process a nested list?

    - by ddbeck
    Suppose I have a bulleted list like this: * list item 1 * list item 2 (a parent) ** list item 3 (a child of list item 2) ** list item 4 (a child of list item 2 as well) *** list item 5 (a child of list item 4 and a grand-child of list item 2) * list item 6 I'd like to parse that into a nested list or some other data structure which makes the parent-child relationship between elements explicit (rather than depending on their contents and relative position). For example, here's a list of tuples containing an item and a list of its children (and so forth): [('list item 1',), ('list item 2', [('list item 3',), [('list item 4', [('list item 5'),]] ('list item 6',)] I've attempted to do this with plain Python and some experimentation with Pyparsing, but I'm not making progress. I'm left with two major questions: What's the strategy I need to employ to make this work? I know recursion is part of the solution, but I'm having a hard time making the connection between this and, say, a Fibonacci sequence. I'm certain I'm not the first person to have done this, but I don't know the terminology of the problem to make fruitful searches for more information on this topic. What problems are related to this so that I can learn more about solving these kinds of problems in general?

    Read the article

  • Any Java library for address extraction from emails?

    - by Hans Klock
    I'm looking for an Java open-source library which is able to extract address information from a (German) email (signature). The library should find name street city, city code/postal code email tel/fax address-parser.com is an commercial product, but a free (albeit simple) library would be great. stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string is asking for something similar, but my problem is broader because the address information is hidden in a complete email. And there isn't a solution either... Any ideas?

    Read the article

  • Create PHP DOM xml file and create a save file link/prompt without writing the file to the server wh

    - by Reed Richards
    I've created a PHP DOM xml piece and saved it to a string like this: <?php // create a new XML document $doc = new DomDocument('1.0'); ... ... ... $xmldata = $doc->saveXML(); ?> Now I can't use the headers to send a file download prompt and I can't write the file to the server, or rather I don't want the file laying around on it. Something like a save this file link or a download prompt would be good. How do I do it?

    Read the article

  • Problem with eastern european characters when scraping data from the European Parliaments Website

    - by Thomas Jensen
    Dear Experts I am trying to scrape a lot of data from the European Parliament website for a research project. Ther first step is the create a list if all parliamentarians, however due to the many Eastern European names and the accents they use i get a lot of missing entries. Here is an example of what is giving me troubles (notice the accents at the end of the family name): ANDRIKIENE, Laima Liucija Group of the European People's Party (Christian Democrats) So far I have been using PyParser and the following code: parser_names name = Word(alphanums + alphas8bit) begin, end = map(Suppress, "<") names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end for name in names.searchString(page): print(name) However this does not catch the name from the html above. Any advice in how to proceed? Best, Thomas

    Read the article

  • pyparsing ambiguity

    - by Claudiu
    I'm trying to parse some text using PyParser. The problem is that I have names that can contain white spaces. So my input might look like this: Joe Bob Jimmy Foo Joe decides to eat. Bob decides to not eat. Jimmy Foo decides to eat. How can I create a parser for the decides to eat line? If I create my name parser naively, meaning with alphabetic characters plus space characters, then it will match the entire line.

    Read the article

  • Extracting email addresses in an html block in ruby/rails

    - by corroded
    I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it) I've tried regexes and so far this has been successful: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i problem is, i need to ignore all email addresses with mailto hrefs. for example: <a href="mailto:[email protected]">[email protected]</a> should only return the second email add. To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this: <a href="mailto:[email protected]">moc.liam@tset</a> problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance! Here were my references btw: so.com/questions/504860/extract-email-addresses-from-a-block-of-text so.com/questions/1376149/regexp-for-extracting-a-mailto-address im also testing using this: http://rubular.com/ edit here's my current helper code: def email_obfuscator(text) text.gsub(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m| m = "<span class='anti-spam'>#{m.reverse}</span>" } end which results in this: <a target="_self" href="mailto:<span class='anti-spam'>moc.liamg@tset</span>"><span class="anti-spam">moc.liamg@tset</span></a>

    Read the article

  • Objective-C Implementation Pointers

    - by Dwaine Bailey
    Hi, I am currently writing an XML parser that parses a lot of data, with a lot of different nodes (the XML isn't designed by me, and I have no control over the content...) Anyway, it currently takes an unacceptably long time to download and read in (about 13 seconds) and so I'm looking for ways to increase the efficiency of the read. I've written a function to create hash values, so that the program no longer has to do a lot of string comparison (just NSUInteger comparison), but this still isn't reducing the complexity of the read in... So I thought maybe I could create an array of IMPs so that, I could then go something like: for(int i = 0; i < [hashValues count]; i ++) { if(currHash == [[hashValues objectAtIndex:i] unsignedIntValue]) { [impArray objectAtIndex:i]; } } Or something like that. The only problem is that I don't know how to actually make the call to the IMP function? I've read that I perform the selector that an IMP defines by going IMP tImp = [impArray objectAtIndex:i]; tImp(self, @selector(methodName)); But, if I need to know the name of the selector anyway, what's the point? Can anybody help me out with what I want to do? Or even just some more ways to increase the efficiency of the parser...

    Read the article

  • Accessing items separated by -componentsSeparatedByString

    - by Graeme
    Hi, I have an array gathered by componentsSeparatedByString: that looks like the following when I use po in the GDB after the array has gone through componentsSeparatedByString: "\n\t\t <b>Suburb, </b> BAIRNSDALE", "\n\t\t <b>Address, </b> 15K NW BAIRNSDALE", "\n\t\t <b>Reference, </b> MELWOOD/SCHOOL ROAD", "\n\t\t <b>Last Changed, </b> 09/04/10 05, 29, 00 PM", "\n\t\t <b>Type, </b> HOME", "\n\t\t <b>Status, </b> BUILT", "\n\t\t <b>Property Size, </b> 2.00 HA.", "\n\t\t <b>Residents, </b> 2", "\n\t\t <b>First Added Date/Time, </b> 09/04/10 03, 15, 00 PM", "\n\t\t\t" Only problem is, I now can't figure out where to go from here. I need to be able to access each of these items (i.e. type, status, property size) separately rather than just calling the entire array (i.e. currentProperty.status). How do I do this? Also what's with all the n\t\t\t things - how do I get rid of them? Thanks.

    Read the article

  • Getting content of the node which has childs via DOMDocument

    - by altern
    I have following html: <html ><body >Body text <div >div content</div></body></html> How could I get content of body without nested <div>? I need to get 'Body text', but do not have a clue how to do this. result of running $domhtml = DOMDocument::loadHTML($html); print $domhtml->getElementsByTagName('body')->item(0)->nodeValue; is 'Body textdiv content', which is not exactly what I want to get

    Read the article

  • parse search string

    - by Benjamin Ortuzar
    I have search strings, similar to the one bellow: energy food "olympics 2010" Terrorism OR "government" OR cups NOT transport and I need to parse it with PHP5 to detect if the content belongs to any of the following clusters: AllWords array AnyWords array NotWords array These are the rules i have set: If it has OR before or after the word or quoted words if belongs to AnyWord. If it has a NOT before word or quoted words it belongs to NotWords If it has 0 or more more spaces before the word or quoted phrase it belongs to AllWords. So the end result should be something similar to: AllWords: (energy, food, "olympics 2010") AnyWords: (terrorism, "government", cups) NotWords: (Transport) What would be a good way to do this?

    Read the article

  • boost::Spirit Grammar for unsorted schema

    - by Hassan Syed
    I have a section of a schema for a model that I need to parse. Lets say it looks like the following. { type = "Standard"; hostname="x.y.z"; port="123"; } The properties are: The elements may appear unordered. All elements that are part of the schema must appear, and no other. All of the elements' synthesised attributes go into a struct. (optional) The schema might in the future depend on the type field -- i.e., different fields based on type -- however I am not concerned about this at the moment.

    Read the article

  • why does b'(and sometimes b' ') show up when I split some HTML source[Python]

    - by Oliver
    I'm fairly new to Python and programming in general. I have done a few tutorials and am about 2/3 through a pretty good book. That being said I've been trying to get more comfortable with Python and proggramming by just trying things in the std lib out. that being said I have recently run into a wierd quirk that I'm sure is the result of my own incorrect or un-"pythonic" use of the urllib module(with Python 3.2.2) import urllib.request HTML_source = urllib.request.urlopen(www.somelink.com).read() print(HTML_source) when this bit is run through the active interpreter it returns the HTML source of somelink, however it prefixes it with b' for example b'<HTML>\r\n<HEAD> (etc). . . . if I split the string into a list by whitespace it prefixes every item with the b' I'm not really trying to accomplish something specific just trying to familiarize myself with the std lib. I would like to know why this b' is getting prefixed also bonus -- Is there a better way to get HTML source WITHOUT using a third party module. I know all that jazz about not reinventing the wheel and what not but I'm trying to learn by "building my own tools" Thanks in Advance!

    Read the article

  • How do I get 3 lines of text from a paragraph in C#

    - by Keltex
    I'm trying to create an "snippet" from a paragraph. I have a long paragraph of text with a word hilighted in the middle. I want to get the line containing the word before that line and the line after that line. I have the following piece of information: The text (in a string) The lines are deliminated by a NEWLINE character \n I have the index into the string of the text I want to hilight A couple other criteria: If my word falls on first line of the paragraph, it should show the 1st 3 lines If my word falls on the last line of the paragraph, it should show the last 3 lines Should show the entire paragraph in the degenative cases (the paragraph only has 1 or 2 lines) Here's an example: This is the 1st line of CAT text in the paragraph This is the 2nd line of BIRD text in the paragraph This is the 3rd line of MOUSE text in the paragraph This is the 4th line of DOG text in the paragraph This is the 5th line of RABBIT text in the paragraph Example, if my index points to BIRD, it should show lines 1, 2, & 3 as one complete string like this: This is the 1st line of CAT text in the paragraph This is the 2nd line of BIRD text in the paragraph This is the 3rd line of MOUSE text in the paragraph If my index points to DOG, it should show lines 3, 4, & 5 as one complete string like this: This is the 3rd line of MOUSE text in the paragraph This is the 4th line of DOG text in the paragraph This is the 5th line of RABBIT text in the paragraph etc. Anybody want to help tackle this?

    Read the article

  • How to parse a custom XML-style error code response from a website

    - by user1870127
    I'm developing a program that queries and prints out open data from the local transit authority, which is returned in the form of an XML response. Normally, when there are buses scheduled to run in the next few hours (and in other typical situations), the XML response generated by the page is handled correctly by the java.net.URLConnection.getInputStream() function, and I am able to print the individual results afterwards. The problem is when the buses are NOT running, or when some other problem with my queries develops after it is sent to the transit authority's web server. When the authority developed their service, they came up with their own unique error response codes, which are also sent as XMLs. For example, one of these error messages might look like this: <Error xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> <Code>3005</Code> <Message>Sorry, no stop estimates found for given values.</Message> </Error> (This code and similar is all that I receive from the transit authority in such situations.) However, it appears that URLConnection.getInputStream() and some of its siblings are unable to interpret this custom code as a "valid" response that I can handle and print out as an error message. Instead, they give me a more generic HTTP/1.1 404 Not Found error. This problem cascades into my program which then prints out a java.io.FileNotFoundException error pointing to the offending input stream. My question is therefore two-fold: 1. Is there a way to retrieve, parse, and print a custom XML-formatted error code sent by a web service using the plugins that are available in Java? 2. If the above is not possible, what other tools should I use or develop to handle such custom codes as described?

    Read the article

  • PHP Simple_html_dom issue

    - by stef
    The snippet below loops through some web pages, grabs the html and then looks for table.results and gets the plaintext out of the tags contained in each . $results is ok. Now I'm trying to get the href value of an tag that is found in the second of each . I'd like to include this in the $results array, but I'm not sure how to do this. The third foreach statement gets them but then I need to merge $links with $results. Ideally I'd also get the links in the second foreach statement. Does anyone know how? $i = 0; foreach( $urls as $u ) { $html = file_get_html($u); foreach($html->find('.results tbody tr') as $element) { $result[$i] = $this->extract($element->plaintext); $i++; } foreach($html->find('.results tbody tr a') as $element) { $links[$i] = $element->href; $i++; } } print_r($result); print_r($links); die;

    Read the article

  • Why does ANTLR not parse the entire input?

    - by Martin Wiboe
    Hello, I am quite new to ANTLR, so this is likely a simple question. I have defined a simple grammar which is supposed to include arithmetic expressions with numbers and identifiers (strings that start with a letter and continue with one or more letters or numbers.) The grammar looks as follows: grammar while; @lexer::header { package ConFreeG; } @header { package ConFreeG; import ConFreeG.IR.*; } @parser::members { } arith: term | '(' arith ( '-' | '+' | '*' ) arith ')' ; term returns [AExpr a]: NUM { int n = Integer.parseInt($NUM.text); a = new Num(n); } | IDENT { a = new Var($IDENT.text); } ; fragment LOWER : ('a'..'z'); fragment UPPER : ('A'..'Z'); fragment NONNULL : ('1'..'9'); fragment NUMBER : ('0' | NONNULL); IDENT : ( LOWER | UPPER ) ( LOWER | UPPER | NUMBER )*; NUM : '0' | NONNULL NUMBER*; fragment NEWLINE:'\r'? '\n'; WHITESPACE : ( ' ' | '\t' | NEWLINE )+ { $channel=HIDDEN; }; I am using ANTLR v3 with the ANTLR IDE Eclipse plugin. When I parse the expression (8 + a45) using the interpreter, only part of the parse tree is generated: http://imgur.com/iBaEC.png Why does the second term (a45) not get parsed? The same happens if both terms are numbers. Thank you, Martin Wiboe

    Read the article

< Previous Page | 141 142 143 144 145 146 147 148 149 150 151 152  | Next Page >