Search Results

Search found 2 results on 1 pages for 'coquelicot'.

Page 1/1 | 1 

  • Writing more efficient xquery code (avoiding redundant iteration)

    - by Coquelicot
    Here's a simplified version of a problem I'm working on: I have a bunch of xml data that encodes information about people. Each person is uniquely identified by an 'id' attribute, but they may go by many names. For example, in one document, I might find <person id=1>Paul Mcartney</person> <person id=2>Ringo Starr</person> And in another I might find: <person id=1>Sir Paul McCartney</person> <person id=2>Richard Starkey</person> I want to use xquery to produce a new document that lists every name associated with a given id. i.e.: <person id=1> <name>Paul McCartney</name> <name>Sir Paul McCartney</name> <name>James Paul McCartney</name> </person> <person id=2> ... </person> The way I'm doing this now in xquery is something like this (pseudocode-esque): let $ids := distinct-terms( [all the id attributes on people] ) for $id in $ids return <person id={$id}> { for $unique-name in distinct-values ( for $name in ( [all names] ) where $name/@id=$id return $name ) return <name>{$unique-name}</name> } </person> The problem is that this is really slow. I imagine the bottleneck is the innermost loop, which executes once for every id (of which there are about 1200). I'm dealing with a fair bit of data (300 MB, spread over about 800 xml files), so even a single execution of the query in the inner loop takes about 12 seconds, which means that repeating it 1200 times will take about 4 hours (which might be optimistic - the process has been running for 3 hours so far). Not only is it slow, it's using a whole lot of virtual memory. I'm using Saxon, and I had to set java's maximum heap size to 10 GB (!) to avoid getting out of memory errors, and it's currently using 6 GB of physical memory. So here's how I'd really like to do this (in Pythonic pseudocode): persons = {} for id in ids: person[id] = set() for person in all_the_people_in_my_xml_document: persons[person.id].add(person.name) There, I just did it in linear time, with only one sweep of the xml document. Now, is there some way to do something similar in xquery? Surely if I can imagine it, a reasonable programming language should be able to do it (he said quixotically). The problem, I suppose, is that unlike Python, xquery doesn't (as far as I know) have anything like an associative array. Is there some clever way around this? Failing that, is there something better than xquery that I might use to accomplish my goal? Because really, the computational resources I'm throwing at this relatively simple problem are kind of ridiculous.

    Read the article

  • String comparison in Python: is vs. ==

    - by Coquelicot
    I noticed a Python script I was writing was acting squirrelly, and traced it to an infinite loop, where the loop condition was "while line is not ''". Running through it in the debugger, it turned out that line was in fact ''. When I changed it to != rather than 'is not', it worked fine. I did some searching, and found this question, the top answer to which seemed to be just what I needed. Except the answer it gave was counter to my experience. Specifically, the answerer wrote: For all built-in Python objects (like strings, lists, dicts, functions, etc.), if x is y, then x==y is also True. I double-checked the type of the variable, and it was in fact of type str (not unicode or something). Is his answer just wrong, or is there something else afoot? Also, is it generally considered better to just use '==' by default, even when comparing int or Boolean values? I've always liked to use 'is' because I find it more aesthetically pleasing and pythonic (which is how I fell into this trap...), but I wonder if it's intended to just be reserved for when you care about finding two objects with the same id.

    Read the article

1