Search Results

Search found 5 results on 1 pages for 'pynewbie'.

Page 1/1 | 1 

  • Problems with Program startup in WIn 7 this week

    - by PyNEwbie
    I have a program (ISYS) that I have been using since 2006. It migrated successfully to Windows 7. Just yesterday it started manifesting this strange behavior. The program has a selector to allow you to select a directory that might have relevant files for the program. As of yesterday the program will only let me select folders on the C drive. For example I have a folder on D that I need to access with this program - from the GUI when I select the D drive the directory list does not load, instead it churns away using 17% of the CPU cycles. I have let it run for an hour several times. I have found that I can get the directory I want by using a batch file to start the program but this limits my ability to do certain things I really need to use the GUI. I did a number of reboots and other tests - I disconnected drives but once I try to select some directory on a drive other than C it churns away. I have experimented quite a bit and am convinced (which means I am wrong) that this has something to do with some setting change on my computer that I can't figure out as I don't see any updates. Since ISYS has not been updated I feel confident it is something internal. Any suggestions would be appreciated.

    Read the article

  • Python's FTPLib too slow ?

    - by PyNEwbie
    I have been playing around with Python's FTP library and am starting to think that it is too slow as compared to using a script file in DOS? I run sessions where I download thousands of data files (I think I have over 8 million right now). My observation is that the download process seems to take five to ten times as long in Python than it does as compared to using the ftp commands in the DOS shell. Since I don't want anyone to fix my code I have not included any. I am more interested in understanding if my observation is valid or if I need to tinker more with the arguments.

    Read the article

  • Source for Names to use in web scraping

    - by PyNEwbie
    Can anyone suggest a good source of names that I can use to help analyze some tables on web pages. The first column of the tables I am scraping have names alone, names and titles or just titles. The names can be as varied as John Smith to Vikram Saksena. I have been poking around for a compiled list of words that can be found in proper names.

    Read the article

  • Should I use Python or Assembly for a super fast copy program

    - by PyNEwbie
    As a maintenance issue I need to routinely (3-5 times per year) copy a repository that is now has over 20 million files and exceeds 1.5 terabytes in total disk space. I am currently using RICHCOPY, but have tried others. RICHCOPY seems the fastest but I do not believe I am getting close to the limits of the capabilities of my XP machine. I am toying around with using what I have read in The Art of Assembly Language to write a program to copy my files. My other thought is to start learning how to multi-thread in Python to do the copies. I am toying around with the idea of doing this in Assembly because it seems interesting, but while my time is not incredibly precious it is precious enough that I am trying to get a sense of whether or not I will see significant enough gains in copy speed. I am assuming that I would but I only started really learning to program 18 months and it is still more or less a hobby. Thus I may be missing some fundamental concept of what happens with interpreted languages. Any observations or experiences would be appreciated. Note, I am not looking for any code. I have already written a basic copy program in Python 2.6 that is no slower than RICHCOPY. I am looking for some observations on which will give me more speed. Right now it takes me over 50 hours to make a copy from a disk to a Drobo and then back from the Drobo to a disk. I have a LogicCube for when I am simply duplicating a disk but sometimes I need to go from a disk to Drobo or the reverse. I am thinking that given that I can sector copy a 3/4 full 2 terabyte drive using the LogicCube in under seven hours I should be able to get close to that using Assembly, but I don't know enough to know if this is valid. (Yes, sometimes ignorance is bliss) The reason I need to speed it up is I have had two or three cycles where something has happened during copy (fifty hours is a long time to expect the world to hold still) that has caused me to have to trash the copy and start over. For example, last week the water main broke under our building and shorted out the power.

    Read the article

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

1