pdf parsing - Page 101 - Developer IT

Objective-C Implementation Pointers

- by Dwaine Bailey

Hi, I am currently writing an XML parser that parses a lot of data, with a lot of different nodes (the XML isn't designed by me, and I have no control over the content...) Anyway, it currently takes an unacceptably long time to download and read in (about 13 seconds) and so I'm looking for ways to increase the efficiency of the read. I've written a function to create hash values, so that the program no longer has to do a lot of string comparison (just NSUInteger comparison), but this still isn't reducing the complexity of the read in... So I thought maybe I could create an array of IMPs so that, I could then go something like: for(int i = 0; i < [hashValues count]; i ++) { if(currHash == [[hashValues objectAtIndex:i] unsignedIntValue]) { [impArray objectAtIndex:i]; } } Or something like that. The only problem is that I don't know how to actually make the call to the IMP function? I've read that I perform the selector that an IMP defines by going IMP tImp = [impArray objectAtIndex:i]; tImp(self, @selector(methodName)); But, if I need to know the name of the selector anyway, what's the point? Can anybody help me out with what I want to do? Or even just some more ways to increase the efficiency of the parser...

Read the article

Right recursive grammar or left recursive?

- by user2485710

I have little to no knowledge of what I'm about to ask, so I would like a suggestion based on the level of skills required to implemented a parser for the given grammar ( since I'm a beginner in this kind of formal approach to parsers and languages ). Just by going back of a couple of years, this situation reminds me a little of Pascal grammar vs C/C++ grammar, this left vs right stuff. But I'm not going to do any of that, my purpose is to implement a simple parser for a markup language for documents like Markdown. So considering that I'm starting with a markup language in mind, I want to keep things simple, which is the easiest one to handle between this 2 options and why . Another kind of grammar could be an easier option for me ? If yes which one do you suggest ?

Read the article

Is it possible to set attribute to NULL in XML so that it can be parsed and recognised as NULL in sq

- by Shantanu Gupta

How to parse attributes in sql server so that it can recognize NULL values and consider it as a NULL in sql server. I am using Sql server 2005

Read the article

How do I get 3 lines of text from a paragraph in C#

- by Keltex

I'm trying to create an "snippet" from a paragraph. I have a long paragraph of text with a word hilighted in the middle. I want to get the line containing the word before that line and the line after that line. I have the following piece of information: The text (in a string) The lines are deliminated by a NEWLINE character \n I have the index into the string of the text I want to hilight A couple other criteria: If my word falls on first line of the paragraph, it should show the 1st 3 lines If my word falls on the last line of the paragraph, it should show the last 3 lines Should show the entire paragraph in the degenative cases (the paragraph only has 1 or 2 lines) Here's an example: This is the 1st line of CAT text in the paragraph This is the 2nd line of BIRD text in the paragraph This is the 3rd line of MOUSE text in the paragraph This is the 4th line of DOG text in the paragraph This is the 5th line of RABBIT text in the paragraph Example, if my index points to BIRD, it should show lines 1, 2, & 3 as one complete string like this: This is the 1st line of CAT text in the paragraph This is the 2nd line of BIRD text in the paragraph This is the 3rd line of MOUSE text in the paragraph If my index points to DOG, it should show lines 3, 4, & 5 as one complete string like this: This is the 3rd line of MOUSE text in the paragraph This is the 4th line of DOG text in the paragraph This is the 5th line of RABBIT text in the paragraph etc. Anybody want to help tackle this?

Read the article

Getting content of the node which has childs via DOMDocument

- by altern

I have following html: <html ><body >Body text <div >div content</div></body></html> How could I get content of body without nested <div>? I need to get 'Body text', but do not have a clue how to do this. result of running $domhtml = DOMDocument::loadHTML($html); print $domhtml->getElementsByTagName('body')->item(0)->nodeValue; is 'Body textdiv content', which is not exactly what I want to get

Read the article

parse search string

- by Benjamin Ortuzar

I have search strings, similar to the one bellow: energy food "olympics 2010" Terrorism OR "government" OR cups NOT transport and I need to parse it with PHP5 to detect if the content belongs to any of the following clusters: AllWords array AnyWords array NotWords array These are the rules i have set: If it has OR before or after the word or quoted words if belongs to AnyWord. If it has a NOT before word or quoted words it belongs to NotWords If it has 0 or more more spaces before the word or quoted phrase it belongs to AllWords. So the end result should be something similar to: AllWords: (energy, food, "olympics 2010") AnyWords: (terrorism, "government", cups) NotWords: (Transport) What would be a good way to do this?

Read the article

why does b'(and sometimes b' ') show up when I split some HTML source[Python]

- by Oliver

I'm fairly new to Python and programming in general. I have done a few tutorials and am about 2/3 through a pretty good book. That being said I've been trying to get more comfortable with Python and proggramming by just trying things in the std lib out. that being said I have recently run into a wierd quirk that I'm sure is the result of my own incorrect or un-"pythonic" use of the urllib module(with Python 3.2.2) import urllib.request HTML_source = urllib.request.urlopen(www.somelink.com).read() print(HTML_source) when this bit is run through the active interpreter it returns the HTML source of somelink, however it prefixes it with b' for example b'<HTML>\r\n<HEAD> (etc). . . . if I split the string into a list by whitespace it prefixes every item with the b' I'm not really trying to accomplish something specific just trying to familiarize myself with the std lib. I would like to know why this b' is getting prefixed also bonus -- Is there a better way to get HTML source WITHOUT using a third party module. I know all that jazz about not reinventing the wheel and what not but I'm trying to learn by "building my own tools" Thanks in Advance!

Read the article

Why does ANTLR not parse the entire input?

- by Martin Wiboe

Hello, I am quite new to ANTLR, so this is likely a simple question. I have defined a simple grammar which is supposed to include arithmetic expressions with numbers and identifiers (strings that start with a letter and continue with one or more letters or numbers.) The grammar looks as follows: grammar while; @lexer::header { package ConFreeG; } @header { package ConFreeG; import ConFreeG.IR.*; } @parser::members { } arith: term | '(' arith ( '-' | '+' | '*' ) arith ')' ; term returns [AExpr a]: NUM { int n = Integer.parseInt($NUM.text); a = new Num(n); } | IDENT { a = new Var($IDENT.text); } ; fragment LOWER : ('a'..'z'); fragment UPPER : ('A'..'Z'); fragment NONNULL : ('1'..'9'); fragment NUMBER : ('0' | NONNULL); IDENT : ( LOWER | UPPER ) ( LOWER | UPPER | NUMBER )*; NUM : '0' | NONNULL NUMBER*; fragment NEWLINE:'\r'? '\n'; WHITESPACE : ( ' ' | '\t' | NEWLINE )+ { $channel=HIDDEN; }; I am using ANTLR v3 with the ANTLR IDE Eclipse plugin. When I parse the expression (8 + a45) using the interpreter, only part of the parse tree is generated: http://imgur.com/iBaEC.png Why does the second term (a45) not get parsed? The same happens if both terms are numbers. Thank you, Martin Wiboe

Read the article

What is the most efficient way to parse json in jquery?

- by Pandiya Chendur

I am using json data and iterating it through jquery and displaying my results... Using var jsonObj = JSON.parse(HfJsonValue); works in firefox but not in IE6.... HfjsonValue is a json string which is returned from my aspx code behind page... SO i dont use ajax... Any suggestion to get my json parsed better and cross browser one...

Read the article

How to extract paragaph and selected lines with Perl

- by neversaint

I have a text that looks like this. What I want to do is to extract the whole paragraph under the section "Aceview summary" until the line that starts with "Please quote". extract the line that starts with "The closest human gene". And store them into array with two elements. However I am stuck with the following script logic. What's the right way to achieve that? #!/usr/bin/perl -w my $INFILE_file_name = $file; # input file name open ( INFILE, '<', $INFILE_file_name ) or croak "$0 : failed to open input file $INFILE_file_name : $!\n"; my @allsum; while ( <INFILE> ) { chomp; my $line = $_; my @temp1 = (); if ( $line =~ /^ AceView summary/ ) { print "$line\n"; push @temp1, $line; } elsif( $line =~ /Please quote/) { push @allsum, [@temp1]; @temp1 = (); } } close ( INFILE ); # close input file

Read the article

XpathQuery to retrive particular data from html pahe in iphone.

- by Warrior

I am new to iphone development.I want to retrieve a particular data from a HTML page (Data inside a particular tag like This is title).I have the link of the web page.How should i use xpathQuery to retrieve "This is title" inside the tags.Please help me out.Thanks.

Read the article

Input string was not in the correct format using int.Parse

- by JDWebs

I have recently been making a login 'representation' which is not secure. So before answering, please note I am aware of security risks etc., and this will not be on a live site. Also note I am a beginner :P. For my login representation, I am using LINQ to compare values of a DDL to select a username and a Textbox to enter a password, when a login button is clicked. However, an error is thrown 'Input string was not in the correct format', when using int.Parse. Front End: <%@ Page Language="C#" AutoEventWireup="true" CodeFile="Login_Test.aspx.cs" Inherits="Login_Login_Test" %> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title>Login Test</title> </head> <body> <form id="LoginTest" runat="server"> <div> <asp:DropDownList ID="DDL_Username" runat="server" Height="20px" DataTextField="txt"> </asp:DropDownList> <br /> <asp:TextBox ID="TB_Password" runat="server" TextMode="Password"></asp:TextBox> <br /> <asp:Button ID="B_Login" runat="server" onclick="B_Login_Click" Text="Login" /> <br /> <asp:Literal ID="LI_Result" runat="server"></asp:Literal> </div> </form> </body> </html> Back End: using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Web.UI; using System.Web.UI.WebControls; public partial class Login_Login_Test : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { if (!Page.IsPostBack) { Binder(); } } private void Binder() { using (DataClassesDataContext db = new DataClassesDataContext()) { DDL_Username.DataSource = from x in db.DT_Honeys select new { id = x.UsernameID, txt = x.Username }; DDL_Username.DataValueField = "id"; DDL_Username.DataTextField = "txt"; DDL_Username.DataBind(); } } protected void B_Login_Click(object sender, EventArgs e) { if (TB_Password.Text != "") { using (DataClassesDataContext db = new DataClassesDataContext()) { DT_Honey blah = new DT_Honey(); blah = db.DT_Honeys.SingleOrDefault(x => x.UsernameID == int.Parse(DDL_Username.SelectedValue.ToString())); if (blah == null) { LI_Result.Text = "Something went wrong :/"; } if (blah.Password == TB_Password.Text) { LI_Result.Text = "Credentials recognised :-)"; } else { LI_Result.Text = "Error with credentials :-("; } } } } } I am aware this problem is very common, but none of the help I have found online is useful/relevant. Any help/suggestions appreciated; thank you for your time :-).

Read the article

How to parse a string into a nullable int in C# (.NET 3.5)

- by Glenn Slaven

I'm wanting to parse a string into a nullable int in C#. ie. I want to get back either the int value of the string or null if it can't be parsed. I was kind of hoping that this would work int? val = stringVal as int?; But that won't work, so the way I'm doing it now is I've written this extension method public static int? ParseNullableInt(this string value) { if (value == null || value.Trim() == string.Empty) { return null; } else { try { return int.Parse(value); } catch { return null; } } } Is there a better way of doing this? EDIT: Thanks for the TryParse suggestions, I did know about that, but it worked out about the same. I'm more interested in knowing if there is a built-in framework method that will parse directly into a nullable int?

Read the article

PHP Simple_html_dom issue

- by stef

The snippet below loops through some web pages, grabs the html and then looks for table.results and gets the plaintext out of the tags contained in each . $results is ok. Now I'm trying to get the href value of an tag that is found in the second of each . I'd like to include this in the $results array, but I'm not sure how to do this. The third foreach statement gets them but then I need to merge $links with $results. Ideally I'd also get the links in the second foreach statement. Does anyone know how? $i = 0; foreach( $urls as $u ) { $html = file_get_html($u); foreach($html->find('.results tbody tr') as $element) { $result[$i] = $this->extract($element->plaintext); $i++; } foreach($html->find('.results tbody tr a') as $element) { $links[$i] = $element->href; $i++; } } print_r($result); print_r($links); die;

Read the article

translating play in HTML to python

- by aharon

So, I'd like to represent one of Shakespeare's plays, Hamlet, into the following objects (maybe this isn't the best representation, if so please tell me): class Play(): acts = [] ... def add_act(self, act): acts.append(act) class Act(): scenes = [] ... def add_scene(self, scene): scenes.append(scene) class Scene(): elems = [] def __init__(self, title, setting=""): ... def add_elem(self, elem): elems.append(elem) ... class StageDirection(): # elem def __init__(self, text): ... class Line(): # elem def __init__(self, id, text, character = None): ... # A None character represents a continuation from the previous line # id could be, for example, 1.1.1 There are other methods, of course, for printing and such in each of the classes. The question is, how do I get a structure based on these classes (or something like them) from HTML 4 code that looks like this: <H3>ACT I</h3> <h3>SCENE I. Elsinore. A platform before the castle.</h3> <p><blockquote> <i>FRANCISCO at his post. Enter to him BERNARDO</i> </blockquote> <A NAME=speech1><b>BERNARDO</b></a> <blockquote> <A NAME=1.1.1>Who's there?</A><br> </blockquote> <A NAME=speech2><b>FRANCISCO</b></a> <blockquote> <A NAME=1.1.2>Nay, answer me: stand, and unfold yourself.</A><br> </blockquote> <A NAME=speech3><b>BERNARDO</b></a> <blockquote> <A NAME=1.1.3>Long live the king!</A><br> </blockquote> <A NAME=speech4><b>FRANCISCO</b></a> <blockquote> <A NAME=1.1.4>Bernardo?</A><br> </blockquote> <A NAME=speech5><b>BERNARDO</b></a> <blockquote> <A NAME=1.1.5>He.</A><br> </blockquote>  translating that into something like this: play = Play() actI = Act() sceneI = Scene("Scene I", "Elsinore. A platform before the castle.") sceneI.add_elem(StageDirection("Francisco at his post. Enter to him Bernardo.")) sceneI.add_elem(Line("Bernardo", "Who's there?")) ... Of course, I don't expect all the code—but what libraries and, when there aren't libraries, logic should I use? Thanks. (This is for a future opensource project and me learning Python for fun—not homework.)

Read the article

writing header in csv python with DictWriter

- by user248237

assume I have a csv.DictReader object and I want to write it out as a csv file. How can I do this? I thought of the following: dr = csv.DictReader(open(f), delimiter='\t') # process my dr object # ... # write out object output = csv.DictWriter(open(f2, 'w'), delimiter='\t') for item in dr: output.writerow(item) Is that the best way? More importantly, how can I make it so a header is written out too, in this case the object "dr"s .fieldnames property? thanks.

Read the article

python getelementbyid from string

- by matthewgall

Hey, I have the following program, that is trying to upload a file (or files) to an image upload site, however I am struggling to find out how to parse the returned HTML to grab the direct link (contained in a ). I have the code below: #!/usr/bin/python # -*- coding: utf-8 -*- import pycurl import urllib import urlparse import xml.dom.minidom import StringIO import sys import gtk import os import imghdr import locale import gettext try: import pynotify except: print "Please install pynotify." APP="Uploadir Uploader" DIR="locale" locale.setlocale(locale.LC_ALL, '') gettext.bindtextdomain(APP, DIR) gettext.textdomain(APP) _ = gettext.gettext ##STRINGS uploading = _("Uploading image to Uploadir.") oneimage = _("1 image has been successfully uploaded.") multimages = _("images have been successfully uploaded.") uploadfailed = _("Unable to upload to Uploadir.") class Uploadir: def __init__(self, args): self.images = [] self.urls = [] self.broadcasts = [] self.username="" self.password="" if len(args) == 1: return else: for file in args: if file == args[0] or file == "": continue if file.startswith("-u"): self.username = file.split("-u")[1] #print self.username continue if file.startswith("-p"): self.password = file.split("-p")[1] #print self.password continue self.type = imghdr.what(file) self.images.append(file) for file in self.images: self.upload(file) self.setClipBoard() self.broadcast(self.broadcasts) def broadcast(self, l): try: str = '\n'.join(l) n = pynotify.Notification(str) n.set_urgency(pynotify.URGENCY_LOW) n.show() except: for line in l: print line def upload(self, file): #Try to login cookie_file_name = "/tmp/uploadircookie" if ( self.username!="" and self.password!=""): print "Uploadir authentication in progress" l=pycurl.Curl() loginData = [ ("username",self.username),("password", self.password), ("login", "Login") ] l.setopt(l.URL, "http://uploadir.com/user/login") l.setopt(l.HTTPPOST, loginData) l.setopt(l.USERAGENT,"User-Agent: Uploadir (Python Image Uploader)") l.setopt(l.FOLLOWLOCATION,1) l.setopt(l.COOKIEFILE,cookie_file_name) l.setopt(l.COOKIEJAR,cookie_file_name) l.setopt(l.HEADER,1) loginDataReturnedBuffer = StringIO.StringIO() l.setopt( l.WRITEFUNCTION, loginDataReturnedBuffer.write ) if l.perform(): self.broadcasts.append("Login failed. Please check connection.") l.close() return loginDataReturned = loginDataReturnedBuffer.getvalue() l.close() #print loginDataReturned if loginDataReturned.find("<li>Your supplied username or password is invalid.</li>")!=-1: self.broadcasts.append("Uploadir authentication failed. Username/password invalid.") return else: self.broadcasts.append("Uploadir authentication successful.") #cookie = loginDataReturned.split("Set-Cookie: ")[1] #cookie = cookie.split(";",0) #print cookie c = pycurl.Curl() values = [ ("file", (c.FORM_FILE, file)) ] buf = StringIO.StringIO() c.setopt(c.URL, "http://uploadir.com/file/upload") c.setopt(c.HTTPPOST, values) c.setopt(c.COOKIEFILE, cookie_file_name) c.setopt(c.COOKIEJAR, cookie_file_name) c.setopt(c.WRITEFUNCTION, buf.write) if c.perform(): self.broadcasts.append(uploadfailed+" "+file+".") c.close() return self.result = buf.getvalue() #print self.result c.close() doc = urlparse.urlparse(self.result) self.urls.append(doc.getElementsByTagName("download")[0].childNodes[0].nodeValue) def setClipBoard(self): c = gtk.Clipboard() c.set_text('\n'.join(self.urls)) c.store() if len(self.urls) == 1: self.broadcasts.append(oneimage) elif len(self.urls) != 0: self.broadcasts.append(str(len(self.urls))+" "+multimages) if __name__ == '__main__': uploadir = Uploadir(sys.argv) Any help would be gratefully appreciated. Warm regards,

Read the article

How to implement a left recursion eliminator?

- by Mahdi

How can i implement an eliminator for this? A := AB | AC | D | E ;

Read the article

how to read an address in multiple formats like google maps

- by ratan

notice that on google maps you can input the address any way you like. as long as it is a valid address...google maps will read it. In some ruby book I had seen code snippet for something like this, but with phone numbers. Any ideas how this could be done for addresses? in language of your choice. EDIT: i dont care about a "valid" address. I just want to parse an address. so that 123 fake street, WA, 34223 would be an address and so will 123 fake street WA 34223

Read the article

A good pdf for HTML5

- by Jean

Hello, I did google, but could not find a good ebook on HTML5, let me in on good resources. Thanks Jean

Read the article

Perl - Read XML

- by chinna_82

XML <?xml version='1.0'?> <employee> <name>Smith</name> <age>43</age> <sex>M</sex> <department role='manager'>Operations</department> </employee> Perl use XML::Simple; use Data::Dumper; $xml = new XML::Simple; foreach my $data1 ($data = $xml->XMLin("test.xml")) { print Dumper($data1); } Above code managed to all the xml value like this. Output $VAR1 = { 'department' => { 'content' => 'Operations', 'role' => 'manager' }, 'name' => 'John Doe', 'sex' => 'M', 'age' => '43' }; How do I do, if I only want to get the role value. For this example I need to get Role = manager. Any advice or reference link is highly appreciated.

Read the article

boost::Spirit Grammar for unsorted schema

- by Hassan Syed

I have a section of a schema for a model that I need to parse. Lets say it looks like the following. { type = "Standard"; hostname="x.y.z"; port="123"; } The properties are: The elements may appear unordered. All elements that are part of the schema must appear, and no other. All of the elements' synthesised attributes go into a struct. (optional) The schema might in the future depend on the type field -- i.e., different fields based on type -- however I am not concerned about this at the moment.

Read the article

CSSOMParser in gwt client side

- by Zoja

What i would like to do is to read an css file from a GET request on the client side, and then i would like to parse it to check all the classes. The problem is that I need to implement CSSOMParser for that, and here are the imports import org.w3c.dom.css.CSSRule; import org.w3c.dom.css.CSSRuleList; import org.w3c.dom.css.CSSStyleRule; import org.w3c.dom.css.CSSStyleSheet; import com.steadystate.css.parser.CSSOMParser; the problem is that none of those classes ale probably javascript compilant, so they don't want to compile if they're on the client side. Is there a way to get it done ?

Read the article

`strip`ing the results of a split in python

- by Igor

i'm trying to do something pretty simple: line = "name : bob" k, v = line.lower().split(':') k = k.strip() v = v.strip() is there a way to combine this into one line somehow? i found myself writing this over and over again when making parsers, and sometimes this involves way more than just two variables. i know i can use regexp, but this is simple enough to not really have to require it...

Read the article

Recognize Dates In A String

- by Tim Scott

I want a class something like this: public interface IDateRecognizer { DateTime[] Recognize(string s); } The dates might exist anywhere in the string and might be any format. For now, I could limit to U.S. culture formats. The dates would not be delimited in any way. They might have arbitrary amounts of whitespace between parts of the date. The ideas I have are: ANTLR Regex Hand rolled I have never used ANTLR, so I would be learning from scratch. I wonder if there are libraries or code samples out there that do something similar that could jump start me. Is ANTLR too heavy for such a narrow use? I have used Regex a lot before, but I hate it for all the reasons that most people hate it. I could certainly hand roll it but I'd rather not re-solve a solved problem. Suggestions? UPDATE: Here is an example. Given this input: This is a date 11/3/63. Here is another one: November 03, 1963; and another one Nov 03, 63 and some more (11/03/1963). The dates could be in any U.S. format. They might have dashes like 11-2-1963 or weird extra whitespace inside like this: Nov 3, 1963, and even maybe the comma is missing like [Nov 3 63] but that's an edge case. The output should be an array of seven DateTimes. Each date would be the same: 11/03/1963 00:00:00.

Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 101/291 | < Previous Page | 97 98 99 100 101 102 103 104 105 106 107 108 | Next Page >

- by Dwaine Bailey

- by user2485710

- by Shantanu Gupta

- by Keltex

- by altern

- by Benjamin Ortuzar

- by Oliver

- by Martin Wiboe

- by Pandiya Chendur

- by neversaint

- by Warrior

- by JDWebs

- by Glenn Slaven

- by stef

- by aharon

- by user248237

- by matthewgall

- by Mahdi

- by ratan

- by Jean

- by chinna_82

- by Hassan Syed

- by Zoja

- by Igor

- by Tim Scott

< Previous Page | 97 98 99 100 101 102 103 104 105 106 107 108 | Next Page >