Search Results

Search found 37 results on 2 pages for 'minidom'.

Page 2/2 | < Previous Page | 1 2 

  • Cleaning an XML file in Python before parsing

    - by Sam
    I'm using minidom to parse an xml file and it threw an error indicating that the data is not well formed. I figured out that some of the pages have characters like ไอเฟล &, causing the parser to hiccup. Is there an easy way to clean the file before I start parsing it? Right now I'm using a regular expressing to throw away anything that isn't an alpha numeric character and the </> characters, but it isn't quite working.

    Read the article

  • Python modify an xml file

    - by michele
    I have this xml model. link text So I have to add some node (see the text commented) to this file. How I can do it? I have writed this partial code but it doesn't work: xmldoc=minidom.parse(directory) child = xmldoc.createElement("map") for node in xmldoc.getElementsByTagName("Environment"): node.appendChild(child) Thanks in advance.

    Read the article

  • Python: Pretty printing a xml file directly from a tar.gz package

    - by EddyR
    This is the first Python script I've tried to create. I'm reading a xml file from a tar.gz package and then I want to pretty print it. However I can't seem to turn it from a file-like object to a string. I've tried to do it a few different ways including str(), tostring(), etc but nothing is working for me. For testing I just tried to print the string at "print myfile[0:200]" and it always generates "<tarfile.ExFileObject object at 0x10053df10>" import os import sys import tarfile from xml.dom.minidom import parseString tar = tarfile.open("data/ucd.all.flat.tar.gz", "r") getfile = tar.extractfile("ucd.all.flat.xml") myfile = str(getfile) print myfile[0:200] output = parseString(getfile).toprettyxml() print output tar.close()

    Read the article

  • Python regex on list

    - by Peter Nielsen
    Hi there I am trying to build a parser and save the results as an xml file but i have problems.. For instance i get a TypeError: expected string or buffer when i try to run the code.. Would you experts please have a look at my code ? import urllib2, re from xml.dom.minidom import Document from BeautifulSoup import BeautifulSoup as bs osc = open('OSCTEST.html','r') oscread = osc.read() soup=bs(oscread) doc = Document() root = doc.createElement('root') doc.appendChild(root) countries = doc.createElement('countries') root.appendChild(countries) findtags1 = re.compile ('<h1 class="title metadata_title content_perceived_text(.*?)</h1>', re.DOTALL | re.IGNORECASE).findall(soup) findtags2 = re.compile ('<span class="content_text">(.*?)</span>', re.DOTALL | re.IGNORECASE).findall(soup) for header in findtags1: title_elem = doc.createElement('title') countries.appendChild(title_elem) header_elem = doc.createTextNode(header) title_elem.appendChild(header_elem) for item in findtags2: art_elem = doc.createElement('artikel') countries.appendChild(art_elem) s = item.replace('<P>','') t = s.replace('</P>','') text_elem = doc.createTextNode(t) art_elem.appendChild(text_elem) print doc.toprettyxml()

    Read the article

  • xml parsing takes extra spaces.

    - by SDK
    I have been trying to write/parse xml file in python. The tags are very simple <main> <data> abcdef </data> </main> I have written this xml using xml document writer from xml.dom.minidom. How-ever when i try to parse this and try to fetch data-text value, i get 'abcdef' with spaces/carriage return/newline characters in beg and end. Does parsing does not take of indenting spaces? Following is the parsing snippet (ref from net) dom = parseString(data) clipTag = dom.getElementsByTagName('clipdata')[0].toxml() clipData=clipTag.replace('<clipdata>','').replace('</clipdata>','') Kindly suggest.

    Read the article

  • GAE AttributeError

    - by awegawef
    My GAE app runs fine from my computer, but when I upload it, I start getting an AttributeError, specifically: AttributeError: 'dict' object has no attribute 'item' I am using the pylast interface (an API for last.fm--link). Specifically, I am accessing a list of variables of this type: SimilarItem = _namedtuple("SimilarItem", ["item", "match"]) I have a variable of this type, call it sim, and I am trying to access sim.item when I get the attribute error. I should note that I am using Python 2.6 on my computer, and I understand that GAE runs on Python 2.5. Would that make a difference here? I thought they were backwards-compatible. Lastly, I think it could be a possible problem with the modules that pylast imports--maybe they don't work with GAE or something? I did some research but I didn't get any results. Here are the imports: import hashlib import httplib import urllib import threading from xml.dom import minidom import xml.dom import time import shelve import tempfile import sys import htmlentitydefs I would appreciate any help with this frustrating issue. Thanks in advance.

    Read the article

  • Reading UTF-8 XML and writing it to a file with Python

    - by Harri
    I'm trying to parse UTF-8 XML file and save some parts of it to another file. Problem is, that this is my first Python script ever and I'm totally confused about the character encoding problems I'm finding. My script fails immediately when it tries to write non-ascii character to a file, but it can print it to command prompt (at least in some level) Here's the XML (from the parts that matter at least, it's a *.resx file which contains UI strings) <?xml version="1.0" encoding="utf-8"?> <root> <resheader name="foo"> <value>bar</value> </resheader> <data name="lorem" xml:space="preserve"> <value>ipsum öä</value> </data> </root> And here's my python script from xml.dom.minidom import parse names = [] values = [] def getStrings(path): dom = parse(path) data = dom.getElementsByTagName("data") for i in range(len(data)): name = data[i].getAttribute("name") names.append(name) value = data[i].getElementsByTagName("value") values.append(value[0].firstChild.nodeValue.encode("utf-8")) def writeToFile(): with open("uiStrings-fi.py", "w") as f: for i in range(len(names)): line = names[i] + '="'+ values[i] + '"' #varName='varValue' f.write(line) f.write("\n") getStrings("ResourceFile.fi-FI.resx") writeToFile() And here's the traceback: Traceback (most recent call last): File "GenerateLanguageFiles.py", line 24, in writeToFile() File "GenerateLanguageFiles.py", line 19, in writeToFile line = names[i] + '="'+ values[i] + '"' #varName='varValue' UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in ran ge(128) How should I fix my script so it would read and write UTF-8 characters properly? The files I'm trying to generate would be used in test automation with Robots Framework.

    Read the article

  • A doubt on DOM parser used with Python

    - by fixxxer
    I'm using the following python code to search for a node in an XML file and changing the value of an attribute of one of it's children.Changes are happening correctly when the node is displayed using toxml().But, when it is written to a file, the attributes rearrange themselves(as seen in the Source and the Final XML below). Could anyone explain how and why this happen? Python code: #!/usr/bin/env python import xml from xml.dom.minidom import parse dom=parse("max.xml") #print "Please enter the store name:" for sku in dom.getElementsByTagName("node"): if sku.getAttribute("name") == "store": sku.childNodes[1].childNodes[5].setAttribute("value","Delhi,India") print sku.toxml() xml.dom.ext.PrettyPrint(dom, open("new.xml", "w")) a part of the Source XML: <node name='store' node_id='515' module='mpx.lib.node.simple_value.SimpleValue' config_builder='' inherant='false' description='Configurable Value'> <match> <property name='1' value='point'/> <property name='2' value='0'/> <property name='val' value='Store# 09204 Staten Island, NY'/> <property name='3' value='str'/> </match> </node> Final XML : <node config_builder="" description="Configurable Value" inherant="false" module="mpx.lib.node.simple_value.SimpleValue" name="store" node_id="515"> <match> <property name="1" value="point"/> <property name="2" value="0"/> <property name="val" value="Delhi,India"/> <property name="3" value="str"/> </match> </node>

    Read the article

  • Can ElementTree be told to preserve the order of attributes?

    - by dmckee
    I've written a fairly simple filter in python using ElementTree to munge the contexts of some xml files. And it works, more or less. But it reorders the attributes of various tags, and I'd like it to not do that. Does anyone know a switch I can throw to make it keep them in specified order? Context for this I'm working with and on a particle physics tool that has a complex, but oddly limited configuration system based on xml files. Among the many things setup that way are the paths to various static data files. These paths are hardcoded into the existing xml and there are no facilities for setting or varying them based on environment variables, and in our local installation they are necessarily in a different place. This isn't a disaster because the combined source- and build-control tool we're using allows us to shadow certain files with local copies. But even thought the data fields are static the xml isn't, so I've written a script for fixing the paths, but with the attribute rearrangement diffs between the local and master versions are harder to read than necessary. This is my first time taking ElementTree for a spin (and only my fifth or sixth python project) so maybe I'm just doing it wrong. Abstracted for simplicity the code looks like this: tree = elementtree.ElementTree.parse(inputfile) i = tree.getiterator() for e in i: e.text = filter(e.text) tree.write(outputfile) Reasonable or dumb? Related links: How can I get the order of an element attribute list using Python xml.sax? Preserve order of attributes when modifying with minidom

    Read the article

  • A question about DOM parser used with Python

    - by fixxxer
    I'm using the following python code to search for a node in an XML file and changing the value of an attribute of one of it's children.Changes are happening correctly when the node is displayed using toxml().But, when it is written to a file, the attributes rearrange themselves(as seen in the Source and the Final XML below). Could anyone explain how and why this happen? Python code: #!/usr/bin/env python import xml from xml.dom.minidom import parse dom=parse("max.xml") #print "Please enter the store name:" for sku in dom.getElementsByTagName("node"): if sku.getAttribute("name") == "store": sku.childNodes[1].childNodes[5].setAttribute("value","Delhi,India") print sku.toxml() xml.dom.ext.PrettyPrint(dom, open("new.xml", "w")) a part of the Source XML: <node name='store' node_id='515' module='mpx.lib.node.simple_value.SimpleValue' config_builder='' inherant='false' description='Configurable Value'> <match> <property name='1' value='point'/> <property name='2' value='0'/> <property name='val' value='Store# 09204 Staten Island, NY'/> <property name='3' value='str'/> </match> </node> Final XML : <node config_builder="" description="Configurable Value" inherant="false" module="mpx.lib.node.simple_value.SimpleValue" name="store" node_id="515"> <match> <property name="1" value="point"/> <property name="2" value="0"/> <property name="val" value="Delhi,India"/> <property name="3" value="str"/> </match> </node>

    Read the article

  • 'NoneType' object has no attribute 'data'

    - by Bill Jordan
    Hello guys, I am sending a SOAP request to my server and getting the response back. sample of the response string is shown below: <?xml version = '1.0' ?> <env:Envelope xmlns:env=http:////www.w3.org/2003/05/soap-envelop . .. .. <env:Body> <epas:get-all-config-resp xmlns:epas="urn:organization:epas:soap"> ^M ... ... <epas:property name="Tom">12</epas:property> > > <epas:property name="Alice">34</epas:property> > > <epas:property name="John">56</epas:property> > > <epas:property name="Danial">78</epas:property> > > <epas:property name="George">90</epas:property> > > <epas:property name="Luise">11</epas:property> ... ^M </env:Body? </env:Envelop> What I noticed in the response is that there is an extra character shown in the body which is "^M". Not sure if this could be the issue. Note the ^M shown! when I tried parsing the string returned from the server to get the names and values using the code sample: elements = minidom.parseString(xmldoc).getElementsByTagName("property") myDict = {} for element in elements: myDict[element.getAttribute('name')] = element.firstChild.data But, I am getting this error: 'NoneType' object has no attribute 'data'. May be its something to do with the "^M" shown on the xml response back! Any ideas/comments would be appreciated, Cheers

    Read the article

  • python getelementbyid from string

    - by matthewgall
    Hey, I have the following program, that is trying to upload a file (or files) to an image upload site, however I am struggling to find out how to parse the returned HTML to grab the direct link (contained in a ). I have the code below: #!/usr/bin/python # -*- coding: utf-8 -*- import pycurl import urllib import urlparse import xml.dom.minidom import StringIO import sys import gtk import os import imghdr import locale import gettext try: import pynotify except: print "Please install pynotify." APP="Uploadir Uploader" DIR="locale" locale.setlocale(locale.LC_ALL, '') gettext.bindtextdomain(APP, DIR) gettext.textdomain(APP) _ = gettext.gettext ##STRINGS uploading = _("Uploading image to Uploadir.") oneimage = _("1 image has been successfully uploaded.") multimages = _("images have been successfully uploaded.") uploadfailed = _("Unable to upload to Uploadir.") class Uploadir: def __init__(self, args): self.images = [] self.urls = [] self.broadcasts = [] self.username="" self.password="" if len(args) == 1: return else: for file in args: if file == args[0] or file == "": continue if file.startswith("-u"): self.username = file.split("-u")[1] #print self.username continue if file.startswith("-p"): self.password = file.split("-p")[1] #print self.password continue self.type = imghdr.what(file) self.images.append(file) for file in self.images: self.upload(file) self.setClipBoard() self.broadcast(self.broadcasts) def broadcast(self, l): try: str = '\n'.join(l) n = pynotify.Notification(str) n.set_urgency(pynotify.URGENCY_LOW) n.show() except: for line in l: print line def upload(self, file): #Try to login cookie_file_name = "/tmp/uploadircookie" if ( self.username!="" and self.password!=""): print "Uploadir authentication in progress" l=pycurl.Curl() loginData = [ ("username",self.username),("password", self.password), ("login", "Login") ] l.setopt(l.URL, "http://uploadir.com/user/login") l.setopt(l.HTTPPOST, loginData) l.setopt(l.USERAGENT,"User-Agent: Uploadir (Python Image Uploader)") l.setopt(l.FOLLOWLOCATION,1) l.setopt(l.COOKIEFILE,cookie_file_name) l.setopt(l.COOKIEJAR,cookie_file_name) l.setopt(l.HEADER,1) loginDataReturnedBuffer = StringIO.StringIO() l.setopt( l.WRITEFUNCTION, loginDataReturnedBuffer.write ) if l.perform(): self.broadcasts.append("Login failed. Please check connection.") l.close() return loginDataReturned = loginDataReturnedBuffer.getvalue() l.close() #print loginDataReturned if loginDataReturned.find("<li>Your supplied username or password is invalid.</li>")!=-1: self.broadcasts.append("Uploadir authentication failed. Username/password invalid.") return else: self.broadcasts.append("Uploadir authentication successful.") #cookie = loginDataReturned.split("Set-Cookie: ")[1] #cookie = cookie.split(";",0) #print cookie c = pycurl.Curl() values = [ ("file", (c.FORM_FILE, file)) ] buf = StringIO.StringIO() c.setopt(c.URL, "http://uploadir.com/file/upload") c.setopt(c.HTTPPOST, values) c.setopt(c.COOKIEFILE, cookie_file_name) c.setopt(c.COOKIEJAR, cookie_file_name) c.setopt(c.WRITEFUNCTION, buf.write) if c.perform(): self.broadcasts.append(uploadfailed+" "+file+".") c.close() return self.result = buf.getvalue() #print self.result c.close() doc = urlparse.urlparse(self.result) self.urls.append(doc.getElementsByTagName("download")[0].childNodes[0].nodeValue) def setClipBoard(self): c = gtk.Clipboard() c.set_text('\n'.join(self.urls)) c.store() if len(self.urls) == 1: self.broadcasts.append(oneimage) elif len(self.urls) != 0: self.broadcasts.append(str(len(self.urls))+" "+multimages) if __name__ == '__main__': uploadir = Uploadir(sys.argv) Any help would be gratefully appreciated. Warm regards,

    Read the article

< Previous Page | 1 2