Search Results

Search found 64010 results on 2561 pages for 'google app engine python'.

Page 341/2561 | < Previous Page | 337 338 339 340 341 342 343 344 345 346 347 348 | Next Page >

(Python) Extracting Text from Source Code?

- by zhuyxn

Currently have a large webpage whose source code is ~200,000 lines of almost all (if not all) HTML. More specifically, it is a webpage whose content is a few thousand blocks of paragraphs separated by line breaks (though a line break does not specifically mean there is a separation in content) My main objective is to extract text from the source code as if I were copying/pasting the webpage into a text editor. There is another parsing function I would like to use, which originally took in copied/pasted text rather than the source code. To do this, I'm currently using urllib2, and calling .get_text() in Beautiful Soup. The problem is, Beautiful Soup is leaving tremendous amounts of white space in my code, and it is difficult to pass the result into the second "text" parser. I have done quite a bit of research on parsing HTMLs, but I'm frankly not sure how to solve this problem easily. Furthermore, I'm a bit confused on how to use imports like lxml to extract text as if I were to simply copy and paste?

Read the article
Python: Trouble with YACC

- by Rosarch

I'm parsing sentences like: "CS 2310 or equivalent experience" The desired output: [[("CS", 2310)], ["equivalent experience"]] YACC tokenizer symbols: tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'MISC_TEXT', ] t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' terms = {'DEPT_CODE': t_DEPT_CODE, 'COURSE_NUMBER': t_COURSE_NUMBER, 'OR_CONJ': t_OR_CONJ} for name, regex in terms.items(): terms[name] = "^%s$" % regex def t_MISC_TEXT(t): r'\S+' for name, regex in terms.items(): # print "trying to match %s with regex %s" % (t.value, regex) if re.match(regex, t.value): t.type = name return t return t (MISC_TEXT is meant to match anything not caught by the other terms.) Some relevant rules from the parser: precedence = ( ('left', 'MISC_TEXT'), ) def p_statement_course_data(p): 'statement : course_data' p[0] = p[1] def p_course_data(p): 'course_data : course' p[0] = p[1] def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = make_course(p[1], int(p[2])) def p_or_phrase(p): 'or_phrase : statement OR_CONJ statement' p[0] = [[p[1]], [p[3]]] def p_misc_text(p): '''text_aggregate : MISC_TEXT MISC_TEXT | MISC_TEXT text_aggregate | text_aggregate MISC_TEXT ''' p[0] = "%s %s" % (p[0], [1]) def p_text_aggregate_statement(p): 'statement : text_aggregate' p[0] = p[1] Unfortunately, this fails: # works as it should >>> token_list("CS 2110 or equivalent experience") [LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)] # fails. bummer. >>> parser.parse("CS 2110 or equivalent experience") Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11) What am I doing wrong? I don't fully understand how to set precedence rules. Also, this is my error function: def p_error(p): print "Syntax error in input: %s" % p Is there a way to see which rule the parser was trying when it failed? Or some other way to make the parser print which rules its trying?

Read the article
Selecting dictionary items by key efficiently in Python

- by user248237

suppose I have a dictionary whose keys are strings. How can I efficiently make a new dictionary from that which contains only the keys present in some list? for example: # a dictionary mapping strings to stuff mydict = {'quux': ..., 'bar': ..., 'foo': ...} # list of keys to be selected from mydict keys_to_select = ['foo', 'bar', ...] The way I came up with is: filtered_mydict = [mydict[k] for k in mydict.keys() \ if k in keys_to_select] but I think this is highly inefficient because: (1) it requires enumerating the keys with keys(), (2) it requires looking up k in keys_to_select each time. at least one of these can be avoided, I would think. any ideas? I can use scipy/numpy too if needed.

Read the article
python and regular expression with unicode

- by bsn

I need to delete some unicode symbols from the string '?????? ??????? ???????????? ??????????' I know they exist here for sure. I try: re.sub('([\u064B-\u0652\u06D4\u0670\u0674\u06D5-\u06ED]+)', '', '?????? ??????? ???????????? ??????????') but it doesn't work. String stays the same. ant suggestion what i do wrong?

Read the article
Map vs list comprehension in Python

- by hekevintran

When should you use map/filter instead of a list comprehension or generator expression?

Read the article
Noob Python Question: List Confusion

- by potatocubed

I'm trying to transfer the contents of one list to another, but it's not working and I don't know why not. My code looks like this: list1 = [1, 2, 3, 4, 5, 6] list2 = [] for item in list1: list2.append(item) list1.remove(item) But if I run it my output looks like this: >>> list1 [2, 4, 6] >>> list2 [1, 3, 5] My question is threefold, I guess: Why is this happening, how do I make it work, and am I overlooking an incredibly simple solution like a 'move' statement or something?

Read the article
How can I programmatically determine (in Python) when someone connects into my windows 7 machine via

- by Jon Cage

This doesn't need to be a real time solution, but are there some log files or system messages that could be read to identify periods of time where someone was connected via RDP to a Windows 7 machine? I'm building a watchdog script for a computer which will be deployed in a remote place and would like to add this metric to a daily status update.

Read the article
Python 2.7 creating a multidimensional list

- by poop

I don't know why I am having so much trouble creating a 3 dimensional list. I need the program to create an empty n by n list. So for n = 4: x = [[[],[],[],[]],[[],[],[],[]],[[],[],[],[]],[[],[],[],[]]] I've tried using: y = [n*[n*[]]] y = [[[]]* n for i in range(n)] Which both appear to be creating copies of a reference. I've also tried naieve application of the list builder with little success: y = [[[]* n for i in range(n)]* n for i in range(n)] y = [[[]* n for i in range(1)]* n for i in range(n)] I've also tried building up the array iteratively using loops, with no success. In my rapid flurry of attempts to not post something stupidly easy to SO, I came upon a solution: y = [] for i in range(0,n): y.append([[]*n for i in range(n)]) Is there an easier/ more intuitive way of doing this?

Read the article
Search inside dynamic array in python

- by user2091683

I want to implement a code that loops inside an array that its size is set by the user that means that the size isn't constant. for example: A=[1,2,3,4,5] then I want the output to be like this: [1],[2],[3],[4],[5] [1,2],[1,3],[1,4],[1,5] [2,3],[2,4],[2,5] [3,4],[3,5] [4,5] [1,2,3],[1,2,4],[1,2,5] [1,3,4],[1,3,5] and so on [1,2,3,4],[1,2,3,5] [2,3,4,5] [1,2,3,4,5] Can you help me implement this code?

Read the article
Using recursion to sum two numbers (python)

- by James

I need to write a recursive function that can add two numbers (x, y), assuming y is not negative. I need to do it using two functions which return x-1 and x+1, and I can't use + or - anywhere in the code. I have no idea how to start, any hints?

Read the article
How can i optimize this python code

- by RandomVector

def maxVote(nLabels): count = {} maxList = [] maxCount = 0 for nLabel in nLabels: if nLabel in count: count[nLabel] += 1 else: count[nLabel] = 1 #Check if the count is max if count[nLabel] > maxCount: maxCount = count[nLabel] maxList = [nLabel,] elif count[nLabel]==maxCount: maxList.append(nLabel) return random.choice(maxList) nLabels contains a list of integers. The above function returns the integer with highest frequency, if more than one have same frequency then a randomly selected integer from them is returned. E.g. maxVote([1,3,4,5,5,5,3,12,11]) is 5

Read the article
Python: Determine whether list of lists contains a defined sequence

- by duhaime

I have a list of sublists, and I want to see if any of the integer values from the first sublist plus one are contained in the second sublist. For all such values, I want to see if that value plus one is contained in the third sublist, and so on, proceeding in this fashion across all sublists. If there is a way of proceeding in this fashion from the first sublist to the last sublist, I wish to return True; otherwise I wish to return False. In other words, for each value in sublist one, for each "step" in a "walk" across all sublists read left to right, if that value + n (where n = number of steps taken) is contained in the current sublist, the function should return True; otherwise it should return False. (Sorry for the clumsy phrasing--I'm not sure how to clean up my language without using many more words.) Here's what I wrote. a = [ [1,3],[2,4],[3,5],[6],[7] ] def find_list_traversing_walk(l): for i in l[0]: index_position = 0 first_pass = 1 walking_current_path = 1 while walking_current_path == 1: if first_pass == 1: first_pass = 0 walking_value = i if walking_value+1 in l[index_position + 1]: index_position += 1 walking_value += 1 if index_position+1 == len(l): print "There is a walk across the sublists for initial value ", walking_value - index_position return True else: walking_current_path = 0 return False print find_list_traversing_walk(a) My question is: Have I overlooked something simple here, or will this function return True for all true positives and False for all true negatives? Are there easier ways to accomplish the intended task? I would be grateful for any feedback others can offer!

Read the article
Python 2.6 - I can not write dwords greater than 0x7fffffff into registry using _winreg.SetValueEx()

- by stasizke

using regedit.exe I have manually created a key in registry called HKEY_CURRENT_USER/00_Just_a_Test_Key and created two dword values dword_test_1 and dword_test_2 I am trying to write some values into those two keys using following program import _winreg aReg = _winreg.ConnectRegistry(None,_winreg.HKEY_CURRENT_USER) aKey = _winreg.OpenKey(aReg, r"00_Just_a_Test_Key", 0, _winreg.KEY_WRITE) _winreg.SetValueEx(aKey,"dword_test_1",0, _winreg.REG_DWORD, 0x0edcba98) _winreg.SetValueEx(aKey,"dword_test_2",0, _winreg.REG_DWORD, 0xfedcba98) _winreg.CloseKey(aKey) _winreg.CloseKey(aReg) I can write into the first key, dword_test_1, but when I attempt to write into the second, I get following message Traceback (most recent call last): File "D:/src/registry/question.py", line 7, in <module> _winreg.SetValueEx(aKey,"dword_test_2",0, _winreg.REG_DWORD, 0xfedcba98) ValueError: Could not convert the data to the specified type. How do I write the second value 0xfedcba98, or any value greater than 0x7fffffff as a dword value? Originally I was writing script to switch the "My documents" icon on or off by writing "0xf0500174" to hide or "0xf0400174" to display the icon into [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\CLSID{450D8FBA-AD25-11D0-98A8-0800361B1103}\ShellFolder]

Read the article
Python point lookup (coordinate binning?)

- by Rince

Greetings, I am trying to bin an array of points (x, y) into an array of boxes [(x0, y0), (x1, y0), (x0, y1), (x1, y1)] (tuples are the corner points) So far I have the following routine: def isInside(self, point, x0, x1, y0, y1): pr1 = getProduct(point, (x0, y0), (x1, y0)) if pr1 >= 0: pr2 = getProduct(point, (x1, y0), (x1, y1)) if pr2 >= 0: pr3 = getProduct(point, (x1, y1), (x0, y1)) if pr3 >= 0: pr4 = getProduct(point, (x0, y1), (x0, y0)) if pr4 >= 0: return True return False def getProduct(origin, pointA, pointB): product = (pointA[0] - origin[0])*(pointB[1] - origin[1]) - (pointB[0] - origin[0])*(pointA[1] - origin[1]) return product Is there any better way then point-by-point lookup? Maybe some not-obvious numpy routine? Thank you!

Read the article
[Python] Socket: Get user information

- by Giorgio

How can I get information about a user's PC connected to my socket

Read the article
How do i print a table in python?

- by fluxus

using two columns

Read the article
Python: Closing a for loop by reading stdout

- by user1732102

import os dictionaryfile = "/root/john.txt" pgpencryptedfile = "helloworld.txt.gpg" array = open(dictionaryfile).readlines() for x in array: x = x.rstrip('\n') newstring = "echo " + x + " | gpg --passphrase-fd 0 " + pgpencryptedfile os.popen(newstring) I need to create something inside the for loop that will read gpg's output. When gpg outputs this string gpg: WARNING: message was not integrity protected, I need the loop to close and print Success! How can I do this, and what is the reasoning behind it? Thanks Everyone!

Read the article
How Similar are Java, C#, and Python?

- by Alex

I know it is a kind of broad question but any answer are appreciated.

Read the article
Python: printing a range with decimal points

- by jabo

I can print a range of numbers easily using range, but is is possible to print a range with 1 decimal place from -10 to 10? e.g -10.0, -9.9, -9.8 all they way through to +10?

Read the article
A way to use Python which I don't know

- by Konie

In this quicksort function: def qsort2(list): if list == []: return [] else: pivot = list[0] # can't understand the following line lesser, equal, greater = partition(list[1:], [], [pivot], []) return qsort2(lesser) + equal + qsort2(greater) def partition(list, l, e, g): if list == []: return (l, e, g) else: head = list[0] if head < e[0]: return partition(list[1:], l + [head], e, g) elif head > e[0]: return partition(list[1:], l, e, g + [head]) else: return partition(list[1:], l, e + [head], g) I don't understand the sentence below the comment. Can someone tell me what is the meaning of this sentence here?

Read the article
Python how to convert this for loop into a while loop

- by user1690198

I have this for a for loop which I made I was wondering how I would write so it would work with a while loop. def scrollList(myList): negativeIndices=[] for i in range(0,len(myList)): if myList[i]<0: negativeIndices.append(i) return negativeIndices So far I have this def scrollList2(myList): negativeIndices=[] i= 0 length= len(myList) while i != length: if myList[i]<0: negativeIndices.append(i) i=i+1 return negativeIndices

Read the article
Recurrent yearly date alert in Python

- by coulix

Hello Hackerz, Here is the idea A user can set a day alert for a birthday. (We do not care about the year of birth) He also picks if he wants to be alerted 0, 1, 2, ou 7 days (Delta) before the D day. Users have a timezone setting. I want the server to send the alerts at 8 am on the the D day - deleta +- user timezone Example: 12 jun, with "alert me 3 days before" will give 9 of Jun. My idea was to have a trigger_datetime extra field saved on the 'recurrent event' object. Like this a cron Job running every hour on my server will just check for all events matching irs current time hour, day and month and send to the alert. The problem from a year to the next the trigger_date could change ! If the alert is set on 1st of March, with a one day delay that could be either 28 or 29 of February .. Maybe i should not use the trigger date trick and use some other kind of scheme. All plans are welcome.

Read the article
Best way to copy a list in Python

- by sheats

lst1 = ['one', 2, 3] // What is the best way of the following -- or is there another way? lst2 = list(lst1) lst2 = lst1[:] import copy lst2 = copy.copy(lst1)

Read the article
Python execute a function for X seconds

- by theactiveactor

I'm looking for a way for a function to take actions based on how long it has been executing. For example, my function would loop continuously until 5 seconds has elapsed, in which case it returns immediately. Any suggestions?

Read the article
Python list is not the same reference

- by Spì

This is the code: L=[1,2] L is L[:] False Why is this False?

Read the article

< Previous Page | 337 338 339 340 341 342 343 344 345 346 347 348 | Next Page >