Search Results

Search found 4256 results on 171 pages for 'mark henderson'.

Page 66/171 | < Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73  | Next Page >

  • Optimizing python code performance when importing zipped csv to a mongo collection

    - by mark
    I need to import a zipped csv into a mongo collection, but there is a catch - every record contains a timestamp in Pacific Time, which must be converted to the local time corresponding to the (longitude,latitude) pair found in the same record. The code looks like so: def read_csv_zip(path, timezones): with ZipFile(path) as z, z.open(z.namelist()[0]) as input: csv_rows = csv.reader(input) header = csv_rows.next() check,converters = get_aux_stuff(header) for csv_row in csv_rows: if check(csv_row): row = { converter[0]:converter[1](value) for converter, value in zip(converters, csv_row) if allow_field(converter) } ts = row['ts'] lng, lat = row['loc'] found_tz_entry = timezones.find_one(SON({'loc': {'$within': {'$box': [[lng-tz_lookup_radius, lat-tz_lookup_radius],[lng+tz_lookup_radius, lat+tz_lookup_radius]]}}})) if found_tz_entry: tz_name = found_tz_entry['tz'] local_ts = ts.astimezone(timezone(tz_name)).replace(tzinfo=None) row['tz'] = tz_name else: local_ts = (ts.astimezone(utc) + timedelta(hours = int(lng/15))).replace(tzinfo = None) row['local_ts'] = local_ts yield row def insert_documents(collection, source, batch_size): while True: items = list(itertools.islice(source, batch_size)) if len(items) == 0: break; try: collection.insert(items) except: for item in items: try: collection.insert(item) except Exception as exc: print("Failed to insert record {0} - {1}".format(item['_id'], exc)) def main(zip_path): with Connection() as connection: data = connection.mydb.data timezones = connection.timezones.data insert_documents(data, read_csv_zip(zip_path, timezones), 1000) The code proceeds as follows: Every record read from the csv is checked and converted to a dictionary, where some fields may be skipped, some titles be renamed (from those appearing in the csv header), some values may be converted (to datetime, to integers, to floats. etc ...) For each record read from the csv, a lookup is made into the timezones collection to map the record location to the respective time zone. If the mapping is successful - that timezone is used to convert the record timestamp (pacific time) to the respective local timestamp. If no mapping is found - a rough approximation is calculated. The timezones collection is appropriately indexed, of course - calling explain() confirms it. The process is slow. Naturally, having to query the timezones collection for every record kills the performance. I am looking for advises on how to improve it. Thanks. EDIT The timezones collection contains 8176040 records, each containing four values: > db.data.findOne() { "_id" : 3038814, "loc" : [ 1.48333, 42.5 ], "tz" : "Europe/Andorra" } EDIT2 OK, I have compiled a release build of http://toblerity.github.com/rtree/ and configured the rtree package. Then I have created an rtree dat/idx pair of files corresponding to my timezones collection. So, instead of calling collection.find_one I call index.intersection. Surprisingly, not only there is no improvement, but it works even more slowly now! May be rtree could be fine tuned to load the entire dat/idx pair into RAM (704M), but I do not know how to do it. Until then, it is not an alternative. In general, I think the solution should involve parallelization of the task. EDIT3 Profile output when using collection.find_one: >>> p.sort_stats('cumulative').print_stats(10) Tue Apr 10 14:28:39 2012 ImportDataIntoMongo.profile 64549590 function calls (64549180 primitive calls) in 1231.257 seconds Ordered by: cumulative time List reduced from 730 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.012 0.012 1231.257 1231.257 ImportDataIntoMongo.py:1(<module>) 1 0.001 0.001 1230.959 1230.959 ImportDataIntoMongo.py:187(main) 1 853.558 853.558 853.558 853.558 {raw_input} 1 0.598 0.598 370.510 370.510 ImportDataIntoMongo.py:165(insert_documents) 343407 9.965 0.000 359.034 0.001 ImportDataIntoMongo.py:137(read_csv_zip) 343408 2.927 0.000 287.035 0.001 c:\python27\lib\site-packages\pymongo\collection.py:489(find_one) 343408 1.842 0.000 274.803 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:699(next) 343408 2.542 0.000 271.212 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:644(_refresh) 343408 4.512 0.000 253.673 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:605(__send_message) 343408 0.971 0.000 242.078 0.001 c:\python27\lib\site-packages\pymongo\connection.py:871(_send_message_with_response) Profile output when using index.intersection: >>> p.sort_stats('cumulative').print_stats(10) Wed Apr 11 16:21:31 2012 ImportDataIntoMongo.profile 41542960 function calls (41542536 primitive calls) in 2889.164 seconds Ordered by: cumulative time List reduced from 778 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.028 0.028 2889.164 2889.164 ImportDataIntoMongo.py:1(<module>) 1 0.017 0.017 2888.679 2888.679 ImportDataIntoMongo.py:202(main) 1 2365.526 2365.526 2365.526 2365.526 {raw_input} 1 0.766 0.766 502.817 502.817 ImportDataIntoMongo.py:180(insert_documents) 343407 9.147 0.000 491.433 0.001 ImportDataIntoMongo.py:152(read_csv_zip) 343406 0.571 0.000 391.394 0.001 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:384(intersection) 343406 379.957 0.001 390.824 0.001 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:435(_intersection_obj) 686513 22.616 0.000 38.705 0.000 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:451(_get_objects) 343406 6.134 0.000 33.326 0.000 ImportDataIntoMongo.py:162(<dictcomp>) 346 0.396 0.001 30.665 0.089 c:\python27\lib\site-packages\pymongo\collection.py:240(insert) EDIT4 I have parallelized the code, but the results are still not very encouraging. I am convinced it could be done better. See my own answer to this question for details.

    Read the article

  • Why does order matter for ImageMagick's -colorspace operation?

    - by Mark Trapp
    Starting with ImageMagick 6, the command style changed to solve a bunch of problems outlined in the Basic Usage document. That document does imply that for simple operations, one should only need to move the options from before the source file to between the source and output files to convert from the "old" style to the "new" style. However, this doesn't seem to work for the -colorspace operation. When I use the following command, I get an output file with the correct colors: convert -colorspace rgb input.pdf output.png But when I try to use the new command style, the -colorspace operation is never applied: convert input.pdf -colorspace rgb output.png Samples: Why does this occur?

    Read the article

  • C# XML export for Excel table using XmlDocument

    - by Mark
    I am trying to write the following into an XML document: <head> <xml> <x:ExcelWorkbook> <x:ExcelWorksheets> <x:ExcelWorksheet> other code here </x:ExcelWorksheet> </x:ExcelWorksheets> </x:ExcelWorkbook> </xml> </head> However, if I use the following code, it strips out the 'x:'. System.Xml.XmlDocument document = new System.Xml.XmlDocument(); System.Xml.XmlElement htmlNode = document.CreateElement("html"); htmlNode.SetAttribute("xmlns:o", "urn:schemas-microsoft-com:office:office"); htmlNode.SetAttribute("xmlns:x", "urn:schemas-microsoft-com:office:excel"); htmlNode.SetAttribute("xmlns", "http://www.w3.org/TR/REC-html40"); document.AppendChild(htmlNode); System.Xml.XmlElement headNode = document.CreateElement("head"); htmlNode.AppendChild(headNode); headNode.AppendChild( document.CreateElement("xml")).AppendChild( document.CreateElement("x:ExcelWorkbook"))).AppendChild( document.CreateElement("x:ExcelWorksheets")).AppendChild( document.CreateElement("x:ExcelWorksheet")).InnerText="other code here"; How can I stop this from happening? Thanks!

    Read the article

  • how do you create a simple obejctice-c command line project in xcode

    - by Mark
    im moving from a c# VS2008 world into the mac world and I just wanted to know how I can create a quick little command line based application so that I can write many little objective-c apps without worrying about creating an iPhone app or whatever. Which projects do I create in xcode? I can see the Command Line Tool under "Mac os x" but the only options for the type is "C", "C++", "Core Data", "Core Foundation", "Core Services" and "Foundation" but no simple objective c project? Thanks

    Read the article

  • Is it possible to refer to metadata of the target from within the target implementation in MSBuild?

    - by mark
    Dear ladies and sirs. My msbuild targets file contains the following section: <ItemGroup> <Targets Include="T1"> <Project>A\B.sln"</Project> <DependsOnTargets>The targets T1 depends on</DependsOnTargets> </Targets> <Targets Include="T2"> <Project>C\D.csproj"</Project> <DependsOnTargets>The targets T2 depends on</DependsOnTargets> </Targets> ... </ItemGroup> <Target Name="T1" DependsOnTargets="The targets T1 depends on"> <MSBuild Projects="A\B.sln" Properties="Configuration=$(Configuration)" /> </Target> <Target Name="T2" DependsOnTargets="The targets T2 depends on"> <MSBuild Projects="C\D.csproj" Properties="Configuration=$(Configuration)" /> </Target> As you can see, A\B.sln appears twice: As Project metadata of T1 in the ItemGroup section. In the Target statement itself passed to the MSBuild task. I am wondering whether I can remove the second instance and replace it with the reference to the Project metadata of the target, which name is given to the Target task? Exactly the same question is asked for the (Targets.DependsOnTargets) metadata. It is mentioned twice much like the %(Targets.Project) metadata. Thanks. EDIT: I should probably describe the constraints, which must be satisfied by the solution: I want to be able to build individual projects with ease. Today I can simply execute msbuild file.proj /t:T1 to build the T1 target and I wish to keep this ability. I wish to emphasize, that some projects depend on others, so the DependsOnTargets attribute is really necessary for them.

    Read the article

  • How can I add a VS 2010 .Net 4.0 build agent to TFS 2008

    - by Mark Arnott
    My company has two development teams using TFS 2008. My team would like to migrate our .Net 3.5 app to the .Net 4.0 framework, but the company is not ready to upgrade TFS to TFS 2010. Can we still use TFS 2008's team build system but with a Visual Studio 2010 solution/project structure that targets the .Net 4.0 framework? I am thinking we would need to add a new build agent to TFS 2008 that would have VS 2010 installed. But I am not finding any information on how to do this. Is this possible? Are there any articles explaining how to do this?

    Read the article

  • Java/JAXB: Accessing property of object in a list

    - by Mark Lewis
    Hello Using JAXB I've created a series of classes which represent my XML schema. Validating against the schema an XML file has thus become a 'tree' of java objects representing the XML. Now I'd like to access, delete and add an object of one the created types in my tree. If I've got classes' methods arranged like this: RootType class has: public List<FQType> getFq() { // and setter return fq; } FQType class has: public RemapType getRemap() { // and setter return remap; } RemapType class has: public String getSource() { // and setter return source; } What's the most concise way to code reading and writing of the 'source' member of a RemapType instance in an FQType instance with, say, fqtypeID=1, in an array of type RootType (in which RootType instances also each have rootID)? Currently I'm using a for loop Iterator in which is an if rootID = mySelectedRootID. In the if I nest a second for loop Iterator over the contained FQType instances and in that a second if fqTypeID = mySelectedFQTypeID. IE for loop iterator/if statement pairs to recognise the object of desire. With all the bells and whistles this way is nearly 15 lines of code to access a data type - can I do this in one line? Thanks

    Read the article

  • jquery load php file - result is not complete

    - by Mark Nolan
    I'm trying to load or better reload a DIV with content from an included php file. so the file is included in the webadmin.php from the location webadmin/pages.php. Then i alter some data in the DB through serializing. Now I would like to reload the pages.php from the callback of the serialize POST with load();. This all works fine up until the moment the data is supposed to be displayed in the div - i believe its because the php file is loaded from a different location, so the include paths for the DB Connection etc are probably wrong... Should I really write an extra PHP File for jquery or is there a way to tell jquery where to load it from? Its the first time I'm doing this - so bear with me for a moment on this one... Thanks! I guess it wont be much use, but heres the load code: $("#right").load("webadmin/pages.php");

    Read the article

  • What is the cheapest non-colocation way to serve about 10 static files at a rate of 100 megabits per

    - by Mark Maunder
    I've looked at Amazon S3 and it costs roughly $4746 per month for 100 megabits/s (which translates into 31,640 Gigabytes of data transferred. That's at a rate of $0.15 per gig.) I haven't found a cheaper "cloud" option. I'm curious if there's any other cloud hosting option out there cheaper than S3. Uptime is not an issue because I can build failover for most things into the browser. e.g. I can use javascript to say "if the image didn't load then go to this other URL instead." FYI I'm currently using a colocation facility which is about 30% cheaper than S3 and I'm familiar with colo prices - so this question is really about "cloud" services and by that I mean services where I don't have to worry about the infrastructure.

    Read the article

  • Different results coming out of an init method than those expected. Why does this happen and how can

    - by Mark Reid
    When I run this method the two properties I have are set to (NULL) when I try and access them outside of the if statement. But they are set to 0 and NO if I check them inside the for loop. -(id) init { NSLog(@"Jumping into the init method!"); if (self = [super init]) { NSLog(@"Running the init method extras"); accumulator = 0; NSLog(@"self.accumulator is %g", accumulator); decimal = NO; } NSLog(@"Calc after init is: %@ and %@", self.accumulator, self.decimal); return self; } Any suggestions as to why what comes out is different from what's done in the for loop?

    Read the article

  • How do I search & replace all occurrences of a string in a ms word doc with python?

    - by Mark
    Hello there, I am pretty stumped at the moment. Based on http://stackoverflow.com/questions/1045628/can-i-use-win32-com-to-replace-text-inside-a-word-document I was able to code a simple template system that generates word docs out of a template word doc (in Python). My problem is that text in "Text Fields" is not find that way. Even in Word itself there is no option to search everything - you actually have to choose between "Main Document" and "Text Fields". Being new to the Windows world I tried to browse the VBA docs for it but found no help (probably due to "text field" being a very common term). word.Documents.Open(f) wdFindContinue = 1 wdReplaceAll = 2 find_str = '\{\{(*)\}\}' find = word.Selection.Find find.Execute(find_str, False, False, True, False, False, \ True, wdFindContinue, False, False, False) while find.Found: t = word.Selection.Text.__str__() r = process_placeholder(t, answer_data, question_data) if type(r) == dict: errors.append(r) else: find.Execute(t, False, True, False, False, False, \ True, False, False, r, wdReplaceAll) This is the relevant portion of my code. I was able to get around all problems by myself by now (hint: if you want to replace strings with more than 256 chars, you have to do it via clipboard, etc ...) Hope, someone can help me.

    Read the article

  • Collections not read from hibernate/ehcache second-level-cache

    - by Mark van Venrooij
    I'm trying to cache lazy loaded collections with ehcache/hibernate in a Spring project. When I execute a session.get(Parent.class, 123) and browse through the children multiple times a query is executed every time to fetch the children. The parent is only queried the first time and then resolved from the cache. Probably I'm missing something, but I can't find the solution. Please see the relevant code below. I'm using Spring (3.2.4.RELEASE) Hibernate(4.2.1.Final) and ehcache(2.6.6) The parent class: @Entity @Table(name = "PARENT") @Cacheable @Cache(usage = CacheConcurrencyStrategy.READ_WRITE, include = "all") public class ServiceSubscriptionGroup implements Serializable { /** The Id. */ @Id @Column(name = "ID") private int id; @OneToMany(fetch = FetchType.LAZY, mappedBy = "parent") @Cache(usage = CacheConcurrencyStrategy.READ_WRITE) private List<Child> children; public List<Child> getChildren() { return children; } public void setChildren(List<Child> children) { this.children = children; } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; Parent that = (Parent) o; if (id != that.id) return false; return true; } @Override public int hashCode() { return id; } } The child class: @Entity @Table(name = "CHILD") @Cacheable @Cache(usage = CacheConcurrencyStrategy.READ_WRITE, include = "all") public class Child { @Id @Column(name = "ID") private int id; @ManyToOne(fetch = FetchType.LAZY, cascade = CascadeType.ALL) @JoinColumn(name = "PARENT_ID") @Cache(usage = CacheConcurrencyStrategy.READ_WRITE) private Parent parent; public int getId() { return id; } public void setId(final int id) { this.id = id; } private Parent getParent(){ return parent; } private void setParent(Parent parent) { this.parent = parent; } @Override public boolean equals(final Object o) { if (this == o) { return true; } if (o == null || getClass() != o.getClass()) { return false; } final Child that = (Child) o; return id == that.id; } @Override public int hashCode() { return id; } } The application context: <bean id="sessionFactory" class="org.springframework.orm.hibernate4.LocalSessionFactoryBean"> <property name="dataSource" ref="dataSource" /> <property name="annotatedClasses"> <list> <value>Parent</value> <value>Child</value> </list> </property> <property name="hibernateProperties"> <props> <prop key="hibernate.dialect">org.hibernate.dialect.SQLServer2008Dialect</prop> <prop key="hibernate.hbm2ddl.auto">validate</prop> <prop key="hibernate.ejb.naming_strategy">org.hibernate.cfg.ImprovedNamingStrategy</prop> <prop key="hibernate.connection.charSet">UTF-8</prop> <prop key="hibernate.show_sql">true</prop> <prop key="hibernate.format_sql">true</prop> <prop key="hibernate.use_sql_comments">true</prop> <!-- cache settings ehcache--> <prop key="hibernate.cache.use_second_level_cache">true</prop> <prop key="hibernate.cache.use_query_cache">true</prop> <prop key="hibernate.cache.region.factory_class"> org.hibernate.cache.ehcache.SingletonEhCacheRegionFactory</prop> <prop key="hibernate.generate_statistics">true</prop> <prop key="hibernate.cache.use_structured_entries">true</prop> <prop key="hibernate.cache.use_query_cache">true</prop> <prop key="hibernate.transaction.factory_class"> org.hibernate.engine.transaction.internal.jta.JtaTransactionFactory</prop> <prop key="hibernate.transaction.jta.platform"> org.hibernate.service.jta.platform.internal.JBossStandAloneJtaPlatform</prop> </props> </property> </bean> The testcase I'm running: @Test public void testGetParentFromCache() { for (int i = 0; i <3 ; i++ ) { getEntity(); } } private void getEntity() { Session sess = sessionFactory.openSession() sess.setCacheMode(CacheMode.NORMAL); Transaction t = sess.beginTransaction(); Parent p = (Parent) s.get(Parent.class, 123); Assert.assertNotNull(p); Assert.assertNotNull(p.getChildren().size()); t.commit(); sess.flush(); sess.clear(); sess.close(); } In the logging I can see that the first time 2 queries are executed getting the parent and getting the children. Furthermore the logging shows that the child entities as well as the collection are stored in the 2nd level cache. However when reading the collection a query is executed to fetch the children on second and third attempt.

    Read the article

  • WCF Rest services for use with the repository pattern?

    - by mark smith
    Hi there, I am considering moving my Service Layer and my data layer (repository pattern) to a WCF Rest service. So basically i would have my software installed locally (WPF client) which would call the Service Layer that exists via a Rest Service... The service layer would then call my data layer using a WCF Rest Service also OR maybe just call it via the DLL assembly I was hoping to understand what the performance would be like. Currently I have my datalayer and servicelayer installed locally via DLL Assemblies locally on the pc. Also i presume the WCF REST services won't support method overloading hance the same name but with a different signature?? I would really appreciate any feedback anyone can give. Thanks

    Read the article

  • iPhone 4.0 Beta compile for 3.1.3

    - by Mark
    I downloaded the iPhone 4.0 Beta. But for my projects I need to compile for 3.1.3 to be able to still submit my projects to the App Store. If I run an old project that isn't a problem I can see all the versions, but when I start a new project I can only pick the 4.0 beta, how can I fix this?

    Read the article

  • release vs setting-to-nil to free memory

    - by Dan Ray
    In my root view controller, in my didReceiveMemoryWarning method, I go through a couple data structures (which I keep in a global singleton called DataManager), and ditch the heaviest things I've got--one or maybe two images associated with possibly twenty or thirty or more data records. Right now I'm going through and setting those to nil. I'm also setting myself a boolean flag so that various view controllers that need this data can easily know to reload. Thusly: DataManager *data = [DataManager sharedDataManager]; for (Event *event in data.eventList) { event.image = nil; event.thumbnail = nil; } for (WondrMark *mark in data.wondrMarks) { mark.image = nil; } [DataManager sharedDataManager].cleanedMemory = YES; Today I'm thinking, though... and I'm not actually sure all that allocated memory is really being freed when I do that. Should I instead release those images and maybe hit them with a new alloc and init when I need them again later?

    Read the article

  • When to use \A in regex?

    - by S.Mark
    End of line anchor $ match even there is extra trailing \n in matched string, so we use \Z instead of $ For example ^\w+$ will match the string abcd\n but ^\w+\Z is not How about \A and when to use?

    Read the article

  • WPF update binding in a background thread

    - by Mark
    I have a control that has its data bound to a standard ObservableCollection, and I have a background task that calls a service to get more data. I want to, then, update my backing data behind my control, while displaying a "please wait" dialog, but when I add the new items to the collection, the UI thread locks up while it re-binds and updates my controls. Can I get around this so that my animations and stuff keep running on my "please wait" dialog? Or at least give the "appearance" to the user that its not locked up?

    Read the article

  • ASP.net MVC Routing on Postback

    - by Mark Kadlec
    In my ASP.net MVC View I have a dropdown that I want to get details on selection and asynchronously update a div. My aspx is as follows: <% using (Html.BeginForm("Index", "Portal", FormMethod.Post, new { id = "TheForm" })) {%> <h2>Index</h2> <% using (Ajax.BeginForm("Details", new AjaxOptions { UpdateTargetId = "mpkResults" })) { %> <%=Html.DropDownList("Docs", (IEnumerable<SelectListItem>)ViewData["Docs"], new { onchange = "document.getElementById('TheForm').submit();" })%> <p><input type="submit" value="Details" /></p> <% } %> <div id="mpkResults" style="margin:10px 0px 0px 0px;"></div> ... The onchange event fires correctly on selection of the dropdown, but instead of the Details method in my code behind firing, it hits my Index method. Why is the details method not getting hit on the onchange event? My Details() method in the controller is: public ActionResult Details() { ... < It never gets here, just goes to the index() method } It's a little frustrating right now since I'm sure it is a simple mistake but not sure what it could be. I looked at the Source of my page and sure enough, the form looks like it should be routing to the Details Action: <form action="/Portal/Details" method="post" ... Any help would be appreciated.

    Read the article

< Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73  | Next Page >