Search Results

Search found 64010 results on 2561 pages for 'google app engine python'.

Page 341/2561 | < Previous Page | 337 338 339 340 341 342 343 344 345 346 347 348  | Next Page >

  • (Python) Extracting Text from Source Code?

    - by zhuyxn
    Currently have a large webpage whose source code is ~200,000 lines of almost all (if not all) HTML. More specifically, it is a webpage whose content is a few thousand blocks of paragraphs separated by line breaks (though a line break does not specifically mean there is a separation in content) My main objective is to extract text from the source code as if I were copying/pasting the webpage into a text editor. There is another parsing function I would like to use, which originally took in copied/pasted text rather than the source code. To do this, I'm currently using urllib2, and calling .get_text() in Beautiful Soup. The problem is, Beautiful Soup is leaving tremendous amounts of white space in my code, and it is difficult to pass the result into the second "text" parser. I have done quite a bit of research on parsing HTMLs, but I'm frankly not sure how to solve this problem easily. Furthermore, I'm a bit confused on how to use imports like lxml to extract text as if I were to simply copy and paste?

    Read the article

  • Python: Trouble with YACC

    - by Rosarch
    I'm parsing sentences like: "CS 2310 or equivalent experience" The desired output: [[("CS", 2310)], ["equivalent experience"]] YACC tokenizer symbols: tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'MISC_TEXT', ] t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' terms = {'DEPT_CODE': t_DEPT_CODE, 'COURSE_NUMBER': t_COURSE_NUMBER, 'OR_CONJ': t_OR_CONJ} for name, regex in terms.items(): terms[name] = "^%s$" % regex def t_MISC_TEXT(t): r'\S+' for name, regex in terms.items(): # print "trying to match %s with regex %s" % (t.value, regex) if re.match(regex, t.value): t.type = name return t return t (MISC_TEXT is meant to match anything not caught by the other terms.) Some relevant rules from the parser: precedence = ( ('left', 'MISC_TEXT'), ) def p_statement_course_data(p): 'statement : course_data' p[0] = p[1] def p_course_data(p): 'course_data : course' p[0] = p[1] def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = make_course(p[1], int(p[2])) def p_or_phrase(p): 'or_phrase : statement OR_CONJ statement' p[0] = [[p[1]], [p[3]]] def p_misc_text(p): '''text_aggregate : MISC_TEXT MISC_TEXT | MISC_TEXT text_aggregate | text_aggregate MISC_TEXT ''' p[0] = "%s %s" % (p[0], [1]) def p_text_aggregate_statement(p): 'statement : text_aggregate' p[0] = p[1] Unfortunately, this fails: # works as it should >>> token_list("CS 2110 or equivalent experience") [LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)] # fails. bummer. >>> parser.parse("CS 2110 or equivalent experience") Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11) What am I doing wrong? I don't fully understand how to set precedence rules. Also, this is my error function: def p_error(p): print "Syntax error in input: %s" % p Is there a way to see which rule the parser was trying when it failed? Or some other way to make the parser print which rules its trying?

    Read the article

  • Selecting dictionary items by key efficiently in Python

    - by user248237
    suppose I have a dictionary whose keys are strings. How can I efficiently make a new dictionary from that which contains only the keys present in some list? for example: # a dictionary mapping strings to stuff mydict = {'quux': ..., 'bar': ..., 'foo': ...} # list of keys to be selected from mydict keys_to_select = ['foo', 'bar', ...] The way I came up with is: filtered_mydict = [mydict[k] for k in mydict.keys() \ if k in keys_to_select] but I think this is highly inefficient because: (1) it requires enumerating the keys with keys(), (2) it requires looking up k in keys_to_select each time. at least one of these can be avoided, I would think. any ideas? I can use scipy/numpy too if needed.

    Read the article

  • python and regular expression with unicode

    - by bsn
    I need to delete some unicode symbols from the string '?????? ??????? ???????????? ??????????' I know they exist here for sure. I try: re.sub('([\u064B-\u0652\u06D4\u0670\u0674\u06D5-\u06ED]+)', '', '?????? ??????? ???????????? ??????????') but it doesn't work. String stays the same. ant suggestion what i do wrong?

    Read the article

  • Noob Python Question: List Confusion

    - by potatocubed
    I'm trying to transfer the contents of one list to another, but it's not working and I don't know why not. My code looks like this: list1 = [1, 2, 3, 4, 5, 6] list2 = [] for item in list1: list2.append(item) list1.remove(item) But if I run it my output looks like this: >>> list1 [2, 4, 6] >>> list2 [1, 3, 5] My question is threefold, I guess: Why is this happening, how do I make it work, and am I overlooking an incredibly simple solution like a 'move' statement or something?

    Read the article

  • Python 2.7 creating a multidimensional list

    - by poop
    I don't know why I am having so much trouble creating a 3 dimensional list. I need the program to create an empty n by n list. So for n = 4: x = [[[],[],[],[]],[[],[],[],[]],[[],[],[],[]],[[],[],[],[]]] I've tried using: y = [n*[n*[]]] y = [[[]]* n for i in range(n)] Which both appear to be creating copies of a reference. I've also tried naieve application of the list builder with little success: y = [[[]* n for i in range(n)]* n for i in range(n)] y = [[[]* n for i in range(1)]* n for i in range(n)] I've also tried building up the array iteratively using loops, with no success. In my rapid flurry of attempts to not post something stupidly easy to SO, I came upon a solution: y = [] for i in range(0,n): y.append([[]*n for i in range(n)]) Is there an easier/ more intuitive way of doing this?

    Read the article

  • Search inside dynamic array in python

    - by user2091683
    I want to implement a code that loops inside an array that its size is set by the user that means that the size isn't constant. for example: A=[1,2,3,4,5] then I want the output to be like this: [1],[2],[3],[4],[5] [1,2],[1,3],[1,4],[1,5] [2,3],[2,4],[2,5] [3,4],[3,5] [4,5] [1,2,3],[1,2,4],[1,2,5] [1,3,4],[1,3,5] and so on [1,2,3,4],[1,2,3,5] [2,3,4,5] [1,2,3,4,5] Can you help me implement this code?

    Read the article

  • How can i optimize this python code

    - by RandomVector
    def maxVote(nLabels): count = {} maxList = [] maxCount = 0 for nLabel in nLabels: if nLabel in count: count[nLabel] += 1 else: count[nLabel] = 1 #Check if the count is max if count[nLabel] > maxCount: maxCount = count[nLabel] maxList = [nLabel,] elif count[nLabel]==maxCount: maxList.append(nLabel) return random.choice(maxList) nLabels contains a list of integers. The above function returns the integer with highest frequency, if more than one have same frequency then a randomly selected integer from them is returned. E.g. maxVote([1,3,4,5,5,5,3,12,11]) is 5

    Read the article

  • Python: Determine whether list of lists contains a defined sequence

    - by duhaime
    I have a list of sublists, and I want to see if any of the integer values from the first sublist plus one are contained in the second sublist. For all such values, I want to see if that value plus one is contained in the third sublist, and so on, proceeding in this fashion across all sublists. If there is a way of proceeding in this fashion from the first sublist to the last sublist, I wish to return True; otherwise I wish to return False. In other words, for each value in sublist one, for each "step" in a "walk" across all sublists read left to right, if that value + n (where n = number of steps taken) is contained in the current sublist, the function should return True; otherwise it should return False. (Sorry for the clumsy phrasing--I'm not sure how to clean up my language without using many more words.) Here's what I wrote. a = [ [1,3],[2,4],[3,5],[6],[7] ] def find_list_traversing_walk(l): for i in l[0]: index_position = 0 first_pass = 1 walking_current_path = 1 while walking_current_path == 1: if first_pass == 1: first_pass = 0 walking_value = i if walking_value+1 in l[index_position + 1]: index_position += 1 walking_value += 1 if index_position+1 == len(l): print "There is a walk across the sublists for initial value ", walking_value - index_position return True else: walking_current_path = 0 return False print find_list_traversing_walk(a) My question is: Have I overlooked something simple here, or will this function return True for all true positives and False for all true negatives? Are there easier ways to accomplish the intended task? I would be grateful for any feedback others can offer!

    Read the article

  • Python 2.6 - I can not write dwords greater than 0x7fffffff into registry using _winreg.SetValueEx()

    - by stasizke
    using regedit.exe I have manually created a key in registry called HKEY_CURRENT_USER/00_Just_a_Test_Key and created two dword values dword_test_1 and dword_test_2 I am trying to write some values into those two keys using following program import _winreg aReg = _winreg.ConnectRegistry(None,_winreg.HKEY_CURRENT_USER) aKey = _winreg.OpenKey(aReg, r"00_Just_a_Test_Key", 0, _winreg.KEY_WRITE) _winreg.SetValueEx(aKey,"dword_test_1",0, _winreg.REG_DWORD, 0x0edcba98) _winreg.SetValueEx(aKey,"dword_test_2",0, _winreg.REG_DWORD, 0xfedcba98) _winreg.CloseKey(aKey) _winreg.CloseKey(aReg) I can write into the first key, dword_test_1, but when I attempt to write into the second, I get following message Traceback (most recent call last): File "D:/src/registry/question.py", line 7, in <module> _winreg.SetValueEx(aKey,"dword_test_2",0, _winreg.REG_DWORD, 0xfedcba98) ValueError: Could not convert the data to the specified type. How do I write the second value 0xfedcba98, or any value greater than 0x7fffffff as a dword value? Originally I was writing script to switch the "My documents" icon on or off by writing "0xf0500174" to hide or "0xf0400174" to display the icon into [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\CLSID{450D8FBA-AD25-11D0-98A8-0800361B1103}\ShellFolder]

    Read the article

  • Python point lookup (coordinate binning?)

    - by Rince
    Greetings, I am trying to bin an array of points (x, y) into an array of boxes [(x0, y0), (x1, y0), (x0, y1), (x1, y1)] (tuples are the corner points) So far I have the following routine: def isInside(self, point, x0, x1, y0, y1): pr1 = getProduct(point, (x0, y0), (x1, y0)) if pr1 >= 0: pr2 = getProduct(point, (x1, y0), (x1, y1)) if pr2 >= 0: pr3 = getProduct(point, (x1, y1), (x0, y1)) if pr3 >= 0: pr4 = getProduct(point, (x0, y1), (x0, y0)) if pr4 >= 0: return True return False def getProduct(origin, pointA, pointB): product = (pointA[0] - origin[0])*(pointB[1] - origin[1]) - (pointB[0] - origin[0])*(pointA[1] - origin[1]) return product Is there any better way then point-by-point lookup? Maybe some not-obvious numpy routine? Thank you!

    Read the article

  • Python: Closing a for loop by reading stdout

    - by user1732102
    import os dictionaryfile = "/root/john.txt" pgpencryptedfile = "helloworld.txt.gpg" array = open(dictionaryfile).readlines() for x in array: x = x.rstrip('\n') newstring = "echo " + x + " | gpg --passphrase-fd 0 " + pgpencryptedfile os.popen(newstring) I need to create something inside the for loop that will read gpg's output. When gpg outputs this string gpg: WARNING: message was not integrity protected, I need the loop to close and print Success! How can I do this, and what is the reasoning behind it? Thanks Everyone!

    Read the article

  • A way to use Python which I don't know

    - by Konie
    In this quicksort function: def qsort2(list): if list == []: return [] else: pivot = list[0] # can't understand the following line lesser, equal, greater = partition(list[1:], [], [pivot], []) return qsort2(lesser) + equal + qsort2(greater) def partition(list, l, e, g): if list == []: return (l, e, g) else: head = list[0] if head < e[0]: return partition(list[1:], l + [head], e, g) elif head > e[0]: return partition(list[1:], l, e, g + [head]) else: return partition(list[1:], l, e + [head], g) I don't understand the sentence below the comment. Can someone tell me what is the meaning of this sentence here?

    Read the article

  • Python how to convert this for loop into a while loop

    - by user1690198
    I have this for a for loop which I made I was wondering how I would write so it would work with a while loop. def scrollList(myList): negativeIndices=[] for i in range(0,len(myList)): if myList[i]<0: negativeIndices.append(i) return negativeIndices So far I have this def scrollList2(myList): negativeIndices=[] i= 0 length= len(myList) while i != length: if myList[i]<0: negativeIndices.append(i) i=i+1 return negativeIndices

    Read the article

  • Recurrent yearly date alert in Python

    - by coulix
    Hello Hackerz, Here is the idea A user can set a day alert for a birthday. (We do not care about the year of birth) He also picks if he wants to be alerted 0, 1, 2, ou 7 days (Delta) before the D day. Users have a timezone setting. I want the server to send the alerts at 8 am on the the D day - deleta +- user timezone Example: 12 jun, with "alert me 3 days before" will give 9 of Jun. My idea was to have a trigger_datetime extra field saved on the 'recurrent event' object. Like this a cron Job running every hour on my server will just check for all events matching irs current time hour, day and month and send to the alert. The problem from a year to the next the trigger_date could change ! If the alert is set on 1st of March, with a one day delay that could be either 28 or 29 of February .. Maybe i should not use the trigger date trick and use some other kind of scheme. All plans are welcome.

    Read the article

< Previous Page | 337 338 339 340 341 342 343 344 345 346 347 348  | Next Page >