Search Results

Search found 17966 results on 719 pages for 'xml parsing'.

Page 29/719 | < Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36  | Next Page >

  • Parsing multiple files at a time in Perl

    - by sfactor
    I have a large data set (around 90GB) to work with. There are data files (tab delimited) for each hour of each day and I need to perform operations in the entire data set. For example, get the share of OSes which are given in one of the columns. I tried merging all the files into one huge file and performing the simple count operation but it was simply too huge for the server memory. So, I guess I need to perform the operation each file at a time and then add up in the end. I am new to perl and am especially naive about the performance issues. How do I do such operations in a case like this. As an example two columns of the file are. ID OS 1 Windows 2 Linux 3 Windows 4 Windows Lets do something simple, counting the share of the OSes in the data set. So, each .txt file has millions of these lines and there are many such files. What would be the most efficient way to operate on the entire files.

    Read the article

  • Parsing a simple file

    - by Mike Graham
    I have a file consisting of lines of the form Foo="Some information" Bar="More" Starting with such a string, what is the best way to extract "Some information" and "More" as strings? Foo and Bar are always exactly those names.

    Read the article

  • Parsing HTML "Visually"

    - by Midhat
    OKay I am at loss how to name this question. I have some HTML files, probably written by lord Lucifier himself, that I need to parse. It consists of many segments like this, among other html tags <p>HeadingNumber</p> <p style="text-indent:number;margin-top:neg_num ">Heading Text</p> <p>Body</p> Notice that the heading number and text are in seperate p tags, aligned in a horizontal line by css. the css may be whatever Lucifier fancies, a mixture of indents, paddings, margins and positions. However that line is a single object in my business model and should be kept as such. So How do I detect whether two p elements are visually in a single line and process them accordingly. I believe the HTML files are well formed if it helps.

    Read the article

  • Parsing numbers at PreviewTextInput

    - by Nitin Chaudhari
    I have a WPF application in which I have a hook at PreviewTextInput, through that I get the currently entered character and I have the string already entered. Given this I need to write the following function : bool ShouldAccept(char newChar,string existingText) existingText can be comma seperated valid numbers(including exponential) and it should just return false when invalid characters are pressed. My code(if else based) currently has a lot of flaws, I wanted to know if there is any smart way to do it.

    Read the article

  • parsing python to csv

    - by user185955
    I'm trying to download some game stats to do some analysis, only problem is each season the data their isn't 100% consistent. I grab the json file from the site, then wish to save it to a csv with the first line in the csv containing the heading for that column, so the heading would be essentially the key from the python data type. #!/usr/bin/env python import requests import json import csv base_url = 'http://www.afl.com.au/api/cfs/afl/' token_url = base_url + 'WMCTok' player_url = base_url + 'matchItems/round' def printPretty(data): print(json.dumps(data, sort_keys=True, indent=2, separators=(',', ': '))) session = requests.Session() # session makes it simple to use the token across the requests token = session.post(token_url).json()['token'] # get the token session.headers.update({'X-media-mis-token': token}) # set the token Season = 2014 Roundno = 4 if Roundno<10: strRoundno = '0'+str(Roundno) else: strRoundno = str(Roundno) # get some data (could easily be a for loop, might want to put in a delay using Sleep so that you don't get IP blocked) data = session.get(player_url + '/CD_R'+str(Season)+'014'+strRoundno) # print everything printPretty(data.json()) with open('stats_game_test.csv', 'w', newline='') as csvfile: spamwriter = csv.writer(csvfile, delimiter="'",quotechar='|', quoting=csv.QUOTE_ALL) for profile in data.json()['items']: spamwriter.writerow(['%s' %(profile)]) #for key in data.json().keys(): # print("key: %s , value: %s" % (key, data.json()[key])) The above code grabs the json and writes it to a csv, but it puts the key in each individual cell next to the value (eg 'venueId': 'CD_V190'), the key needs to be just across the first row as a heading. It gives me a csv file with data in the cells like this Column A B 'tempInCelsius': 17.0 'totalScore': 32 'tempInCelsius': 16.0 'totalScore': 28 What I want is the data like this tempInCelsius totalScore 17 32 16 28 As I mentioned up the top, the data isn't always consistent so if I define what fields to grab with spamwriter.writerow([profile['tempInCelsius'], profile['totalScore']]) then it will error out on certain data grabs. This is why I'm now trying the above method so it just grabs everything regardless of what data is there.

    Read the article

  • Parsing HTML: Call to a member function > children() on a non-object

    - by sm56d
    Hello all, I was just helped with this question but I can't get it to move to the next block of HTML. $html = file_get_html('http://music.banadir24.com/singer/aasha_abdoo/247.html'); $urls = $html->find('table[width=100%] table tr'); foreach($urls as $url){ $song_name = $url->children(2)->plaintext; $url = $url->children(6)->children(0)->href; } It returns the list of the names of the first album (Deesco) but it does not continue to the next album (The Best Of Aasha)? It just gives me this error: Notice: Trying to get property of non-object in C:\wamp\www\test3.php on line 26 Fatal error: Call to a member function children() on a non-object in C:\wamp\www\test3.php on line 28 Why is this and how can I get it to continue to the next table element? I appreciate any help on this! Please note: This is legal as the songs are not bound by copyright and they are available to download freely, its just I need to download a lot of them and I can't sit there clicking a button all day. Having said that, its taken me an hour to get this far.

    Read the article

  • Parsing String to TreeNode

    - by Krusu70
    Anyone have a good algorithm how to parse a String to TreeNode in Java? Let's say we have a string s which says how to build a TreeNode. A(B,C) means that A is the name (String) of TreeNode, B is child of A (Treenode), C is sibling of A (TreeNode). So if I call function with string A(B(D,E(F,G)),C) (just a example), then I get a TreeNode equals to: level A (String: name), B - Child (TreeNode), C - Sibling (TreeNode) level B (String: name), D - Child of B (TreeNode), E - Sibling of B (TreeNode) level E (String: name), F - Child of E (TreeNode), G - Sibling of E (TreeNode) The name may not be 1 letter, it could be like real name (many letters).

    Read the article

  • Python + Expat: Error on &#0; entities

    - by clacke
    I have written a small function, which uses ElementTree and xpath to extract the text contents of certain elements in an xml file: #!/usr/bin/env python2.5 import doctest from xml.etree import ElementTree from StringIO import StringIO def parse_xml_etree(sin, xpath): """ Takes as input a stream containing XML and an XPath expression. Applies the XPath expression to the XML and returns a generator yielding the text contents of each element returned. >>> parse_xml_etree( ... StringIO('<test><elem1>one</elem1><elem2>two</elem2></test>'), ... '//elem1').next() 'one' >>> parse_xml_etree( ... StringIO('<test><elem1>one</elem1><elem2>two</elem2></test>'), ... '//elem2').next() 'two' >>> parse_xml_etree( ... StringIO('<test><null>&#0;</null><elem3>three</elem3></test>'), ... '//elem2').next() 'three' """ tree = ElementTree.parse(sin) for element in tree.findall(xpath): yield element.text if __name__ == '__main__': doctest.testmod(verbose=True) The third test fails with the following exception: ExpatError: reference to invalid character number: line 1, column 13 Is the � entity illegal XML? Regardless whether it is or not, the files I want to parse contain it, and I need some way to parse them. Any suggestions for another parser than Expat, or settings for Expat, that would allow me to do that?

    Read the article

  • parsing of mathematical expressions

    - by gcc
    (in c90) (linux) input: sqrt(2 - sin(3*A/B)^2.5) + 0.5*(C*~(D) + 3.11 +B) a b /*there are values for a,b,c,d */ c d input: cos(2 - asin(3*A/B)^2.5) +cos(0.5*(C*~(D)) + 3.11 +B) a b /*there are values for a,b,c,d */ c d input: sqrt(2 - sin(3*A/B)^2.5)/(0.5*(C*~(D)) + sin(3.11) +ln(B)) /*max lenght of formula is 250 characters*/ a b /*there are values for a,b,c,d */ c /*each variable with set of floating numbers*/ d As you can see infix formula in the input depends on user. My program will take a formula and n-tuples value. Then it calculate the results for each value of a,b,c and d. If you wonder I am saying ;outcome of program is graph. /sometimes,I think i will take input and store in string. then another idea is arise " I should store formula in the struct" but i don't know how I can construct the code on the base of structure./ really, I don't know way how to store the formula in program code so that I can do my job. can you show me? /* a,b,c,d is letters cos,sin,sqrt,ln is function*/

    Read the article

  • Dealing with infinite loops when constructing states for LR(1) parsing

    - by Bruce
    I'm currently constructing LR(1) states from the following grammar. S->AS S->c A->aA A->b where A,S are nonterminals and a,b,c are terminals. This is the construction of I0 I0: S' -> .S, epsilon --------------- S -> .AS, epsilon S -> .c, epsilon --------------- S -> .AS, a S -> .c, c A -> .aA, a A -> .b, b And I1. From S, I1: S' -> S., epsilon //DONE And so on. But when I get to constructing I4... From a, I4: A -> a.A, a ----------- A -> .aA, a A -> .b, b The problem is A - .aA When I attempt to construct the next state from a, I'm going to once again get the exact same content of I4, and this continues infinitely. A similar loop occurs with S -> .AS So, what am I doing wrong? There has to be some detail that I'm missing, but I've browsed my notes and my book and either can't find or just don't understand what's wrong here. Any help?

    Read the article

  • Ignoring characters in a file while parsing

    - by sfactor
    i need to parse through a text file and process the data. the valid data is usually denoted by either a timestamp with TS followed by 10 numbers (TS1040501134) or values with a alpabet followed by nine numbers (A098098098)...so it will be like TS1040501134A111111111B222222222...........TS1020304050A000000000........ However, there are cases when there will be filler 0s when there is no data. So, such a case might be 00000000000000000000TS1040501134A111111111B2222222220000000000TS1020304050A000000000........` Now as we can see I need to ignore these zeros. how might i do this? I am using gnu C.

    Read the article

  • Parsing XML with Ruby and Nokogiri

    - by Chip Castle
    I have the following XML structure: <charsets> <charset> <name>ANSI_X3.4-1968</name> <aliases> <alias>iso-ir-6</alias> <alias>ANSI_X3.4-1986</alias> <alias>ISO_646.irv:1991</alias> <alias>ASCII</alias> <alias>ISO646-US</alias> <alias>US-ASCII</alias> <alias>us</alias> <alias>IBM367</alias> <alias>cp367</alias> <alias>csASCII</alias> </aliases> </charset> <charset> <name>ISO-10646-UTF-1</name> <aliases> <alias>csISO10646UTF1</alias> </aliases> </charset> </charsets> I can grab the text contents of the the name nodes using Ruby and Nokogiri using: require 'nokogiri' require 'open-uri' doc = Nokogiri::XML(File.open("StandardCharsets.xml")) @charsets = doc.css("charsets name").map {|node| node.children.text } But, what I want is the text contents of all name and alias nodes in the order as they are shown in the source document. Everything I try fails. Does anyone have a good example of how to do this?

    Read the article

  • How to deal with unknown entity references?

    - by Chris
    I'm parsing (a lot of) XML files that contain entity references which i dont know in advance (can't change that fact). For example: xml = "<tag>I'm content with &funny; &entity; &references;.</tag>" when i try to parse this using the following code: final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); final DocumentBuilder db = dbf.newDocumentBuilder(); final InputSource is = new InputSource(new StringReader(xml)); final Document d = db.parse(is); i get the following exception: org.xml.sax.SAXParseException: The entity "funny" was referenced, but not declared. but, what i do want to achieve is, that the parser replaces every entity that is not declared (unknown to the parser) with an empty String ''. Or even better, is there a way to pass a map to the parser like: Map<String,String> entityMapping = ... entityMapping.put("funny","very"); entityMapping.put("entity","important"); entityMapping.put("references","stuff"); so that i could do the following: final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); final DocumentBuilder db = dbf.newDocumentBuilder(); final InputSource is = new InputSource(new StringReader(xml)); db.setEntityResolver(entityMapping); final Document d = db.parse(is); if i would obtain the text from the document using this example code i should receive: I'm content with very important stuff. Any suggestions? Of course, i already would be happy to just replace the unknown entity's with empty strings. Thanks,

    Read the article

  • Manually extracting portions of strings contained in a list (parsing)

    - by user1652011
    I'm aware that there are modules that fully simplify this function, but saying that I am running from a base install of python (standard modules only), how would I extract the following: I have a list. This list is the contents, line by line, of a webpage. Here is a mock up list (unformatted) for informative purposes: <script> link = "/scripts/playlists/1/" + a.id + "/0-5417069212.asx"; <script> "<a href="/apps/audio/?feedId=11065"><span class="px13">Eastern Metro Area Fire</span>" From the above string, I need the following extracted. The feedId (11065), which is incidentally a.id in the code above., "/scripts/playlists/1/" and "/0-5417069212.asx". Remembering that each of these lines is just contents from objects in a list, how would I go about extracting that data? Here is the full list: contents = urllib2.urlopen("http://www.radioreference.com/apps/audio/?ctid=5586") Pseudo: from urllib2 import urlopen as getpage page_contents = getpage("http://www.radioreference.com/apps/audio/?ctid=5586") feedID = % in (page_contents.search() for "/apps/audio/?feedId=%") titleID = % in (page_contents.search() for "<span class="px13">%</span>") playlistID = % in (page_contents.search() for "link = "%" + a.id + "*.asx";") asxID = * in (page_contents.search() for "link = "*" + a.id + "%.asx";") streamURL = "http://www.radioreference.com/" + playlistID + feedID + asxID + ".asx" I plan to format it as such that streamURL should = : http://www.radioreference.com/scripts/playlists/1/11065/0-5417067072.asx

    Read the article

  • parsing command option with default values and range constrains in C

    - by agramfort
    Hi, I need to parse command line arguments in C. My arguments are basically int or float with default values and range constrains. I've started to implement something that look like this: option_float(float* out, int argc, char* argv, char* name, description, float default_val, int is_optional, float min_value, float max_value) which I call for example with: float* pct; option_float(pct, argc, argv, "pct", "My super percentage option", 50, 1, FALSE, 0, 100) however I don't want to reinvent the wheel ! My objective is to have error checking of range constrains, throw an error when the option is not optional and is not set. And generate the help message usually given by usage() function. The usage text would look like this: --pct My super percentage option (default : 50). Should be in [0, 100] I've started with getopt but it is too limited for what I want to do and I feel it still requires me to write too much code for a simple usecase like this. thanks

    Read the article

  • Parsing a Multi-Index Excel File in Pandas

    - by rhaskett
    I have a time series excel file with a tri-level column MultiIndex that I would like to successfully parse if possible. There are some results on how to do this for an index on stack overflow but not the columns and the parse function has a header that does not seem to take a list of rows. The ExcelFile looks like is like the following: Column A is all the time series dates starting on A4 Column B has top_level1 (B1) mid_level1 (B2) low_level1 (B3) data (B4-B100+) Column C has null (C1) null (C2) low_level2 (C3) data (C4-C100+) Column D has null (D1) mid_level2 (D2) low_level1 (D3) data (D4-D100+) Column E has null (E1) null (E2) low_level2 (E3) data (E4-E100+) ... So there are two low_level values many mid_level values and a few top_level values but the trick is the top and mid level values are null and are assumed to be the values to the left. So, for instance all the columns above would have top_level1 as the top multi-index value. My best idea so far is to use transpose, but the it fills Unnamed: # everywhere and doesn't seem to work. In Pandas 0.13 read_csv seems to have a header parameter that can take a list, but this doesn't seem to work with parse.

    Read the article

  • Parsing timestamp with Python2.4

    - by jellybean
    I want to parse a timestamp from a log file that has been written via datetime.datetime.now().strftime('%Y%m%d%H%M%S') and then compute the number of seconds that have passed since this timestamp. I know I could do it with datetime.datetime.strptime to get back a datetime object and then compute a timedelta. Problem is, the strptime function has been introduced with Python 2.5 and I'm using Python2.4.4 (an upgrade is not possible in my context). Any easy way to do this?

    Read the article

  • Postback of delimited text from javascript and parsing on server side

    - by Alt_Doru
    In my ASP.NET page, I have a Javascript object, like this: var args = new Object(); args.Data1 = document.getElementById("Data1").value; args.Data2 = document.getElementById("Data2").value; args.Data3 = document.getElementById("Data3").value; The object is populated on client side using user input data. I am passing the data to a C# method, through an Ajax request: someObj.AjaxRequest(argsData1 + "|" + argsData2 + "|" + argsData3) Finally, I need to obtain the data in my C# code: string data1 = [JS args.Data1] string data2 = [JS args.Data2] string data3 = [JS args.Data3] My question is what's the best solution for this? As i am concatenating bits of user input, I don't think it's best to use "|" as a delimiter. Also, it's not clear to me how to actually parse the data in my C# code to populate the three variables with the original data.

    Read the article

  • Groovy and XML: Not able to insert processing instruction

    - by rhellem
    Scenario Need to update some attributes in an existing XML-file. The file contains a XSL processing instruction, so when the XML is parsed and updated I need to add the instruction before writing it to a file again. Problem is - whatever I do - I'm not able to insert the processing instruction Based on the Java-example found at rgagnon.com I have created the code below Example code ## import groovy.xml.* def xml = '''|<something> | <Settings> | </Settings> |</something>'''.stripMargin() def document = DOMBuilder.parse( new StringReader( xml ) ) def pi = document.createProcessingInstruction('xml-stylesheet', 'type="text/xsl" href="Bp8DefaultView.xsl"'); document.insertBefore(pi, document.documentElement) println document.documentElement Creates output <?xml version="1.0" encoding="UTF-8"?> <something> <Settings> </Settings> </something> What I want <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="Bp8DefaultView.xsl"?> <something> <Settings> </Settings> </something>

    Read the article

< Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36  | Next Page >