Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 103/291 | < Previous Page | 99 100 101 102 103 104 105 106 107 108 109 110  | Next Page >

  • how to detect an escape sequence in a string

    - by mix
    Given a string named line whose raw version has this value: \rRAWSTRING how can I detect if it has the escape character \r? What I've tried is: if repr(line).startswith('\r'): blah... but it doesn't catch it. I also tried find, such as: if repr(line).find('\r') != -1: blah doesn't work either. What am I missing? thx! EDIT: thanks for all the replies and the corrections re terminolgy and sorry for the confusion. OK, if i do this print repr(line) then what it prints is: '\rSET ENABLE ACK\n' (including the single quotes). i have tried all the suggestions, including: line.startswith(r'\r') line.startswith('\\r') each of which returns False. also tried: line.find(r'\r') line.find('\\r') each of which returns -1

    Read the article

  • How to write a bison grammer for WDI?

    - by Rizo
    I need some help in bison grammar construction. From my another question: I'm trying to make a meta-language for writing markup code (such as xml and html) wich can be directly embedded into C/C++ code. Here is a simple sample written in this language, I call it WDI (Web Development Interface): /* * Simple wdi/html sample source code */ #include <mySite> string name = "myName"; string toCapital(string str); html { head { title { mySiteTitle; } link(rel="stylesheet", href="style.css"); } body(id="default") { // Page content wrapper div(id="wrapper", class="some_class") { h1 { "Hello, " + toCapital(name) + "!"; } // Lists post ul(id="post_list") { for(post in posts) { li { a(href=post.getID()) { post.tilte; } } } } } } } Basically it is a C source with a user-friendly interface for html. As you can see the traditional tag-based style is substituted by C-like, with blocks delimited by curly braces. I need to build an interpreter to translate this code to html and posteriorly insert it into C, so that it can be compiled. The C part stays intact. Inside the wdi source it is not necessary to use prints, every return statement will be used for output (in printf function). The program's output will be clean html code. So, for example a heading 1 tag would be transformed like this: h1 { "Hello, " + toCapital(name) + "!"; } // would become: printf("<h1>Hello, %s!</h1>", toCapital(name)); My main goal is to create an interpreter to translate wdi source to html like this: tag(attributes) {content} = <tag attributes>content</tag> Secondly, html code returned by the interpreter has to be inserted into C code with printfs. Variables and functions that occur inside wdi should also be sorted in order to use them as printf parameters (the case of toCapital(name) in sample source). Here are my flex/bison files: id [a-zA-Z_]([a-zA-Z0-9_])* number [0-9]+ string \".*\" %% {id} { yylval.string = strdup(yytext); return(ID); } {number} { yylval.number = atoi(yytext); return(NUMBER); } {string} { yylval.string = strdup(yytext); return(STRING); } "(" { return(LPAREN); } ")" { return(RPAREN); } "{" { return(LBRACE); } "}" { return(RBRACE); } "=" { return(ASSIGN); } "," { return(COMMA); } ";" { return(SEMICOLON); } \n|\r|\f { /* ignore EOL */ } [ \t]+ { /* ignore whitespace */ } . { /* return(CCODE); Find C source */ } %% %start wdi %token LPAREN RPAREN LBRACE RBRACE ASSIGN COMMA SEMICOLON CCODE QUOTE %union { int number; char *string; } %token <string> ID STRING %token <number> NUMBER %% wdi : /* empty */ | blocks ; blocks : block | blocks block ; block : head SEMICOLON | head body ; head : ID | ID attributes ; attributes : LPAREN RPAREN | LPAREN attribute_list RPAREN ; attribute_list : attribute | attribute COMMA attribute_list ; attribute : key ASSIGN value ; key : ID {$$=$1} ; value : STRING {$$=$1} /*| NUMBER*/ /*| CCODE*/ ; body : LBRACE content RBRACE ; content : /* */ | blocks | STRING SEMICOLON | NUMBER SEMICOLON | CCODE ; %% I am having difficulties on defining a proper grammar for the language, specially in splitting WDI and C code . I just started learning language processing techniques so I need some orientation. Could someone correct my code or give some examples of what is the right way to solve this problem?

    Read the article

  • SQL error - Cannot convert nvarchar to decimal

    - by jakesankey
    I have a C# application that simply parses all of the txt documents within a given network directory and imports the data to a SQL server db. Everything was cruising along just fine until about the 1800th file when it happend to have a few blanks in columns that are called out as DBType.Decimal (and the value is usually zero in the files, not blank). So I got this error, "cannot convert nvarchar to decimal". I am wondering how I could tell the app to simply skip the lines that have this issue?? Perhaps I could even just change the column type to varchar even tho values are numbers (what problems could this create?) Thanks for any help! using System; using System.Data; using System.Data.SQLite; using System.IO; using System.Text.RegularExpressions; using System.Threading; using System.Collections.Generic; using System.Linq; using System.Data.SqlClient; namespace JohnDeereCMMDataParser { internal class Program { public static List<string> GetImportedFileList() { List<string> ImportedFiles = new List<string>(); using (SqlConnection connect = new SqlConnection(@"Server=FRXSQLDEV;Database=RX_CMMData;Integrated Security=YES")) { connect.Open(); using (SqlCommand fmd = connect.CreateCommand()) { fmd.CommandText = @"SELECT FileName FROM CMMData;"; fmd.CommandType = CommandType.Text; SqlDataReader r = fmd.ExecuteReader(); while (r.Read()) { ImportedFiles.Add(Convert.ToString(r["FileName"])); } } } return ImportedFiles; } private static void Main(string[] args) { Console.Title = "John Deere CMM Data Parser"; Console.WriteLine("Preparing CMM Data Parser... done"); Console.WriteLine("Scanning for new CMM data..."); Console.ForegroundColor = ConsoleColor.Gray; using (SqlConnection con = new SqlConnection(@"Server=FRXSQLDEV;Database=RX_CMMData;Integrated Security=YES")) { con.Open(); using (SqlCommand insertCommand = con.CreateCommand()) { Console.WriteLine("Connecting to SQL server..."); SqlCommand cmdd = con.CreateCommand(); string[] files = Directory.GetFiles(@"C:\Documents and Settings\js91162\Desktop\CMM WENZEL\", "*_*_*.txt", SearchOption.AllDirectories); List<string> ImportedFiles = GetImportedFileList(); insertCommand.Parameters.Add(new SqlParameter("@FeatType", DbType.String)); insertCommand.Parameters.Add(new SqlParameter("@FeatName", DbType.String)); insertCommand.Parameters.Add(new SqlParameter("@Axis", DbType.String)); insertCommand.Parameters.Add(new SqlParameter("@Actual", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@Nominal", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@Dev", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@TolMin", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@TolPlus", DbType.Decimal)); insertCommand.Parameters.Add(new SqlParameter("@OutOfTol", DbType.Decimal)); foreach (string file in files.Except(ImportedFiles)) { var FileNameExt1 = Path.GetFileName(file); cmdd.Parameters.Clear(); cmdd.Parameters.Add(new SqlParameter("@FileExt", FileNameExt1)); cmdd.CommandText = @" IF (EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'RX_CMMData' AND TABLE_NAME = 'CMMData')) BEGIN SELECT COUNT(*) FROM CMMData WHERE FileName = @FileExt; END"; int count = Convert.ToInt32(cmdd.ExecuteScalar()); con.Close(); con.Open(); if (count == 0) { Console.WriteLine("Preparing to parse CMM data for SQL import..."); if (file.Count(c => c == '_') > 5) continue; insertCommand.CommandText = @" INSERT INTO CMMData (FeatType, FeatName, Axis, Actual, Nominal, Dev, TolMin, TolPlus, OutOfTol, PartNumber, CMMNumber, Date, FileName) VALUES (@FeatType, @FeatName, @Axis, @Actual, @Nominal, @Dev, @TolMin, @TolPlus, @OutOfTol, @PartNumber, @CMMNumber, @Date, @FileName);"; string FileNameExt = Path.GetFullPath(file); string RNumber = Path.GetFileNameWithoutExtension(file); int index2 = RNumber.IndexOf("~"); Match RNumberE = Regex.Match(RNumber, @"^(R|L)\d{6}(COMP|CRIT|TEST|SU[1-9])(?=_)", RegexOptions.IgnoreCase); Match RNumberD = Regex.Match(RNumber, @"(?<=_)\d{3}[A-Z]\d{4}|\d{3}[A-Z]\d\w\w\d(?=_)", RegexOptions.IgnoreCase); Match RNumberDate = Regex.Match(RNumber, @"(?<=_)\d{8}(?=_)", RegexOptions.IgnoreCase); string RNumE = Convert.ToString(RNumberE); string RNumD = Convert.ToString(RNumberD); if (RNumberD.Value == @"") continue; if (RNumberE.Value == @"") continue; if (RNumberDate.Value == @"") continue; if (index2 != -1) continue; DateTime dateTime = DateTime.ParseExact(RNumberDate.Value, "yyyyMMdd", Thread.CurrentThread.CurrentCulture); string cmmDate = dateTime.ToString("dd-MMM-yyyy"); string[] lines = File.ReadAllLines(file); bool parse = false; foreach (string tmpLine in lines) { string line = tmpLine.Trim(); if (!parse && line.StartsWith("Feat. Type,")) { parse = true; continue; } if (!parse || string.IsNullOrEmpty(line)) { continue; } Console.WriteLine(tmpLine); foreach (SqlParameter parameter in insertCommand.Parameters) { parameter.Value = null; } string[] values = line.Split(new[] { ',' }); for (int i = 0; i < values.Length - 1; i++) { if (i = "" || i = null) continue; SqlParameter param = insertCommand.Parameters[i]; if (param.DbType == DbType.Decimal) { decimal value; param.Value = decimal.TryParse(values[i], out value) ? value : 0; } else { param.Value = values[i]; } } insertCommand.Parameters.Add(new SqlParameter("@PartNumber", RNumE)); insertCommand.Parameters.Add(new SqlParameter("@CMMNumber", RNumD)); insertCommand.Parameters.Add(new SqlParameter("@Date", cmmDate)); insertCommand.Parameters.Add(new SqlParameter("@FileName", FileNameExt)); insertCommand.ExecuteNonQuery(); insertCommand.Parameters.RemoveAt("@PartNumber"); insertCommand.Parameters.RemoveAt("@CMMNumber"); insertCommand.Parameters.RemoveAt("@Date"); insertCommand.Parameters.RemoveAt("@FileName"); } } } Console.WriteLine("CMM data successfully imported to SQL database..."); } con.Close(); } } } }

    Read the article

  • Counting total sum of each value in one column w.r.t another in Perl

    - by sfactor
    I have tab delimited data with multiple columns. I have OS names in column 31 and data bytes in columns 6 and 7. What I want to do is count the total volume of each unique OS. So, I did something in Perl like this: #!/usr/bin/perl use warnings; my @hhfilelist = glob "*.txt"; my %count = (); for my $f (@hhfilelist) { open F, $f || die "Cannot open $f: $!"; while (<F>) { chomp; my @line = split /\t/; # counting volumes in col 6 and 7 for 31 $count{$line[30]} = $line[5] + $line[6]; } close (F); } my $w = 0; foreach $w (sort keys %count) { print "$w\t$count{$w}\n"; } So, the result would be something like Windows 100000 Linux 5000 Mac OSX 15000 Android 2000 But there seems to be some error in this code because the resulting values I get aren't as expected. What am I doing wrong?

    Read the article

  • Sanitize Content: removing markup from Amazon's content

    - by StackOverflowNewbie
    I'm using Amazon Web Service to get product descriptions of various items. The problem is that Amazon's content contains mark up that is sometimes destructive to the layout of my web page (e.g. unclosed DIVs, etc.). I want to sanitize the content I get from Amazon. My solution would be to do the following (my initial list so far): Remove unnecessary tags such as div, span, etc. while keeping tags like p, ul, ol, etc. Remove all attributes from all the tags (e.g. seems like there are style attributes in some of the tags) Remove excess white space (e.g. multiple spaces, carriage returns, new lines, tabs, etc.) Etc. Before I go off trying to build my solution, I'm wondering if anyone has a better idea (or an already existing solution). Thanks.

    Read the article

  • Parse Formulae in C#

    - by Cool
    Hello All, I am trying to parse formula in C# language like "5*3 + 2" "(3*4 - 2)/5" Is it possible to do in C# or scripts like VBScript, JavaScript (which will be called in c# program). Thanks a lot!.

    Read the article

  • Parse HTML with CSS or XPath selectors?

    - by ovolko
    My goal is to parse HTML with lxml, which supports both XPath and CSS selectors. I can tie my model properties either to CSS or XPath, but I'm not sure which one would be the best, e.g. less fuss when HTML layout is changed, simpler expressions, greater extraction speed. What would you choose in such a situation?

    Read the article

  • How do I get Bison/YACC to not recognize a command until it parses the whole string?

    - by chucknelson
    I have some bison grammar: input: /* empty */ | input command ; command: builtin | external ; builtin: CD { printf("Changing to home directory...\n"); } | CD WORD printf("Changing to directroy %s\n", $2); } ; I'm wondering how I get Bison to not accept (YYACCEPT?) something as a command until it reads ALL of the input. So I can have all these rules below that use recursion or whatever to build things up, which either results in a valid command or something that's not going to work. One simple test I'm doing with the code above is just entering "cd mydir mydir". Bison parses CD and WORD and goes "hey! this is a command, put it to the top!". Then the next token it finds is just WORD, which has no rule, and then it reports an error. I want it to read the whole line and realize CD WORD WORD is not a rule, and then report an error. I think I'm missing something obvious and would greatly appreciate any help - thanks! Also - I've tried using input command NEWLINE or something similar, but it still pushes CD WORD to the top as a command and then parses the extra WORD separately.

    Read the article

  • Linux: shell builtin string matching

    - by gmatt
    I am trying to become more familiar with using the builtin string matching stuff available in shells in linux. I came across this guys posting, and he showed an example a="abc|def" echo ${a#*|} # will yield "def" echo ${a%|*} # will yield "abc" I tried it out and it does what its advertised to do, but I don't understand what the $,{},#,*,| are doing, I tried looking for some reference online or in the manuals but I couldn't find anything. Can anyone explain to me what's going on here?

    Read the article

  • Jquery to find a name on html page and add hyperlink

    - by mikejones12
    Here is my example: I have a a website that contains the following: <body> Jim Nebraska zipcode 65437 Tony lives in California his zipcode is 98708 </body> I would like to be able to search for zip codes on the page and wrap them with hyperlinks like: <body> Jim Nebraska zipcode <a href="/65437.htm">65437</a> Tony lives in California his zipcode is <a href="/65437.htm">98708</a> </body> Could I use a regex selector to find the string and then wrap the string, or replace it with the new hyperlink? I am new to Jquery and looking for someone to point me in the right direction. Thank you, Mike

    Read the article

  • Parsec Haskell Lists

    - by Martin
    I'm using Text.ParserCombinators.Parsec and Text.XHtml to parse an input and get a HTML output. If my input is: * First item, First level ** First item, Second level ** Second item, Second level * Second item, First level My output should be: <ul><li>First item, First level <ul><li>First item, Second level </li><li>Second item, Second level </li></ul></li><li>Second item, First level</li></ul> I wrote this, but obviously does not work recursively list= do{ s <- many1 item;return (olist << s) } item= do{ (count 1 (char '*')) ;s <- manyTill anyChar newline ;return ( li << s) } Any ideas? the recursion can be more than two levels Thanks!

    Read the article

  • How to Parse Html Website in Perl?

    - by Nano HE
    Hi, Could you please give me some suggestions on how to parse HTML in Perl? BTW, Do I need download some website pages to local harddirver with some Offline Explorer Tool? If I need, Could you give me a download URL link to a good Offline Explorer Tool. I plan to parse the keywords(including URL links) and save them to MySQL. Thanks a lot. WinXP used

    Read the article

  • Parse usable Street Address, City, State, Zip from a string

    - by Rob Allen
    Problem: I have an address field from an Access database which has been converted to Sql Server 2005. This field has everything all in one field. I need to parse out the individual sections of the address into their appropriate fields in a normalized table. I need to do this for approximately 4,000 records and it needs to be repeatable. Here are the rules for this exercise: 1 - no whining about how this should have been separate fields in the first place, we are often confronted with less than ideal situations and have to make the best of them 2- for this post, use any language you want 3- feel free to play code golf 4 - Assume an address in the US (for now) 5 - assume that the input string will sometimes contain an addressee (the person being addressed) and/or a second street address (i.e. Suite B) 6 - states may be abbreviated 7 - zip code could be standard 5 digit or zip+4 8 - there are typos in some instances UPDATE: In response to the questions posed, standards were not universally followed, I need need to store the individual values, not just geocode and errors means typo (corrected above) Sample Data: A. P. Croll & Son 2299 Lewes-Georgetown Hwy, Georgetown, DE 19947 11522 Shawnee Road, Greenwood DE 19950 144 Kings Highway, S.W. Dover, DE 19901 Intergrated Const. Services 2 Penns Way Suite 405 New Castle, DE 19720 Humes Realty 33 Bridle Ridge Court, Lewes, DE 19958 Nichols Excavation 2742 Pulaski Hwy Newark, DE 19711 2284 Bryn Zion Road, Smyrna, DE 19904 VEI Dover Crossroads, LLC 1500 Serpentine Road, Suite 100 Baltimore MD 21 580 North Dupont Highway Dover, DE 19901 P.O. Box 778 Dover, DE 19903

    Read the article

  • Pros and Cons of Java HTML to XML cleaners

    - by cjavapro
    I am looking to allow HTML emails (and other HTML uploads) without letting in scripts and stuff. I plan to have a white list of safe tags and attributes as well as a whitelist of CSS tags and value regexes (to prevent automatic return receipt). I asked a question: Parse a badly formatted XML document (like an HTML file) I found there are many many ways to do this. Some systems have built in sanitizers (which I don't care so much about). I will post some answers and say Community Wiki. Please post any other options you like and say Community Wiki so they can be voted on. Also any comments or wiki edits on what part of a certain product is better and what is not would be greatly appreciated. This page is a very nice listing page but I get kinda lost http://java-source.net/open-source/html-parsers

    Read the article

  • Python: Is there a way to get HTML that was dynamically created by Javascript?

    - by Joschua
    As far as I can tell, this is the case for LyricWikia. The lyrics (example) can be accessed from the browser, but can't be found in the source code (can be opened with CTRL + U in most browsers) or reading the contents of the site with Python: from urllib.request import urlopen URL = 'http://lyrics.wikia.com/Billy_Joel:Piano_Man' r = urlopen(URL).read().decode('utf-8') And the test: >>> 'Now John at the bar is a friend of mine' in r False >>> 'John' in r False But when you select and look at the source code of the box in which the lyrics are displayed, you can see that there is: <div class="lyricbox">[...]</div> Is there a way to get the contents of that div-element with Python?

    Read the article

  • how can I parse and group/do stats on user agent strings?

    - by user151841
    I have a database that has the various user-agent strings of visitors to our site. I'd like to do a 'survey' of them to see what browsers our users are using, so that I can know what features I can use in future development. Is there a tool to parse and run statistics on user-agent strings, or a bunch of strings like this? Ideally, I'd like to see a hierarchical grouping of the stats. For instance: Opera/9.80 (Windows Mobile; WCE; Opera Mobi/WMD-50301; U; en) Presto/2.4.13 Version/10.00 Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.14912/1280; U; en) Presto/2.2.0 Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.13918/812; U; en) Presto/2.2.0 Opera/9.64 (Macintosh; Intel Mac OS X; U; en) Presto/2.1.1 Opera/9.60 (J2ME/MIDP; Opera Mini/4.2.13918/786; U; en) Presto/2.2.0 Opera/9.60 (J2ME/MIDP; Opera Mini/4.0.10992/432; U; en) Presto/2.2.0 I'd like to see 6 entries for Opera, broken down into 3 for 9.80, 1 for 9.64 and 2 for 9.60, and so forth for all browsers. Other dimensions, such as OS, would cross the boundaries of the browser version hierarchy, but it might be nice to see also.

    Read the article

  • How to get Nokogiri to ignore HTML elements that doesn't exist

    - by user296507
    any idea how i can get the code below to produce this output? 1 - 2 - B i'm getting this error "undefined method `text' for nil:NilClass (NoMethodError)", because i think table 1 does not have the element 'td class=r2' in it. require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML.parse(<<-eohtml) <table class="t1"> <tbody> <tr> <td class="r1">1</td> </tr> </tbody> </table> <table class="t2"> <tbody> <tr> <td class="r1">2</td> <td class="r2">B</td> </tr> </tbody> </table> eohtml doc.css('tbody > tr').each do |n| r1 = n.at_css(".r1").text r2 = n.at_css(".r2").text puts "#{r1} - #{r2}" end

    Read the article

  • PHP Regex to match lines with all-caps with occaisional hyphens.

    - by Yaaqov
    I'm trying to to convert an existing PHP Regular Expression match case to apply to a slightly different style of document. Here's the original style of the document: **FOODS - TYPE A** ___________________________________ **PRODUCT** 1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese; 2) La Fe String Cheese **CODE** Sell by date going back to February 1, 2009 And the successfully-running PHP Regex match code that only returns "true" if the line is surrounded by asterisks, and stores each side of the "-" as $m[1] and $m[2], respectively. if ( preg_match('#^\*\*([^-]+)(?:-(.*))?\*\*$#', $line, $m) ) { // only for **header - subheader** $m[2] is set. if ( isset($m[2]) ) { return array(TYPE_HEADER, array(trim($m[1]), trim($m[2]))); } else { return array(TYPE_KEY, array($m[1])); } } So, for line 1: $m[1] = "FOODS" AND $m[2] = "TYPE A"; Line 2 would be skipped; Line 3: $m[1] = "PRODUCT", etc. The question: How would I re-write the above regex match if the headers did not have the asterisks, but still was all-caps, and was at least 4 characters long? For example: FOODS - TYPE A ___________________________________ PRODUCT 1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese; 2) La Fe String Cheese CODE Sell by date going back to February 1, 2009 Thank you.

    Read the article

  • Evaluating mathematical expressions in Python

    - by vander
    Hi, I want to tokenize a given mathematical expression into a binary tree like this: ((3 + 4 - 1) * 5 + 6 * -7) / 2 '/' / \ + 2 / \ * * / \ / \ - 5 6 -7 / \ + 1 / \ 3 4 Is there any pure Python way to do this? Like passing as a string to Python and then get back as a tree like mentioned above. Thanks.

    Read the article

  • Trying to parse xml, but xmldocument.loadxml() is trying to download?

    - by maxp
    I have a string input that i do not know whether or not is valid xml. I think the simplest aprroach is to wrap new XmlDocument().LoadXml(strINPUT); In a try/catch. The problem im facing is, sometimes strINPUT is an html file, if the header of this file contains <!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd""> <html xml:lang=""en-GB"" xmlns=""http://www.w3.org/1999/xhtml"" lang=""en-GB""> ...like many do, it actually tries to make a connection to the w3.org url, which i really dont want it doing. Anyone know if its possible to just parse the string without trying to be clever and checking external urls? Failing that is there an alternative to xmldocument?

    Read the article

< Previous Page | 99 100 101 102 103 104 105 106 107 108 109 110  | Next Page >