Hi folks,
I'm looking for a way to read in c++ a text file containing numpy arrays and put the data into vector , can anyone help me out please ?
Thanks a lot.
Archy
Currently have a large webpage whose source code is ~200,000 lines of almost all (if not all) HTML. More specifically, it is a webpage whose content is a few thousand blocks of paragraphs separated by line breaks (though a line break does not specifically mean there is a separation in content)
My main objective is to extract text from the source code as if I were copying/pasting the webpage into a text editor. There is another parsing function I would like to use, which originally took in copied/pasted text rather than the source code.
To do this, I'm currently using urllib2, and calling .get_text() in Beautiful Soup. The problem is, Beautiful Soup is leaving tremendous amounts of white space in my code, and it is difficult to pass the result into the second "text" parser. I have done quite a bit of research on parsing HTMLs, but I'm frankly not sure how to solve this problem easily. Furthermore, I'm a bit confused on how to use imports like lxml to extract text as if I were to simply copy and paste?
I don't know why I am having so much trouble creating a 3 dimensional list.
I need the program to create an empty n by n list. So for n = 4:
x = [[[],[],[],[]],[[],[],[],[]],[[],[],[],[]],[[],[],[],[]]]
I've tried using:
y = [n*[n*[]]]
y = [[[]]* n for i in range(n)]
Which both appear to be creating copies of a reference.
I've also tried naieve application of the list builder with little success:
y = [[[]* n for i in range(n)]* n for i in range(n)]
y = [[[]* n for i in range(1)]* n for i in range(n)]
I've also tried building up the array iteratively using loops, with no success. In my rapid flurry of attempts to not post something stupidly easy to SO, I came upon a solution:
y = []
for i in range(0,n):
y.append([[]*n for i in range(n)])
Is there an easier/ more intuitive way of doing this?
suppose I have a dictionary whose keys are strings. How can I efficiently make a new dictionary from that which contains only the keys present in some list?
for example:
# a dictionary mapping strings to stuff
mydict = {'quux': ...,
'bar': ...,
'foo': ...}
# list of keys to be selected from mydict
keys_to_select = ['foo', 'bar', ...]
The way I came up with is:
filtered_mydict = [mydict[k] for k in mydict.keys() \
if k in keys_to_select]
but I think this is highly inefficient because: (1) it requires enumerating the keys with keys(), (2) it requires looking up k in keys_to_select each time. at least one of these can be avoided, I would think. any ideas? I can use scipy/numpy too if needed.
I'm parsing sentences like:
"CS 2310 or equivalent experience"
The desired output:
[[("CS", 2310)], ["equivalent experience"]]
YACC tokenizer symbols:
tokens = [
'DEPT_CODE',
'COURSE_NUMBER',
'OR_CONJ',
'MISC_TEXT',
]
t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER = r'[0-9]{4}'
t_OR_CONJ = r'or'
t_ignore = ' \t'
terms = {'DEPT_CODE': t_DEPT_CODE,
'COURSE_NUMBER': t_COURSE_NUMBER,
'OR_CONJ': t_OR_CONJ}
for name, regex in terms.items():
terms[name] = "^%s$" % regex
def t_MISC_TEXT(t):
r'\S+'
for name, regex in terms.items():
# print "trying to match %s with regex %s" % (t.value, regex)
if re.match(regex, t.value):
t.type = name
return t
return t
(MISC_TEXT is meant to match anything not caught by the other terms.)
Some relevant rules from the parser:
precedence = (
('left', 'MISC_TEXT'),
)
def p_statement_course_data(p):
'statement : course_data'
p[0] = p[1]
def p_course_data(p):
'course_data : course'
p[0] = p[1]
def p_course(p):
'course : DEPT_CODE COURSE_NUMBER'
p[0] = make_course(p[1], int(p[2]))
def p_or_phrase(p):
'or_phrase : statement OR_CONJ statement'
p[0] = [[p[1]], [p[3]]]
def p_misc_text(p):
'''text_aggregate : MISC_TEXT MISC_TEXT
| MISC_TEXT text_aggregate
| text_aggregate MISC_TEXT '''
p[0] = "%s %s" % (p[0], [1])
def p_text_aggregate_statement(p):
'statement : text_aggregate'
p[0] = p[1]
Unfortunately, this fails:
# works as it should
>>> token_list("CS 2110 or equivalent experience")
[LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)]
# fails. bummer.
>>> parser.parse("CS 2110 or equivalent experience")
Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11)
What am I doing wrong? I don't fully understand how to set precedence rules.
Also, this is my error function:
def p_error(p):
print "Syntax error in input: %s" % p
Is there a way to see which rule the parser was trying when it failed? Or some other way to make the parser print which rules its trying?
I am hoping to write a script that will allow for the detection of video on a url and provide a download link to a *flv for google chrome.
Anyone have any suggestions were to start and get a footing?
I need to delete some unicode symbols from the string '?????? ??????? ???????????? ??????????'
I know they exist here for sure. I try:
re.sub('([\u064B-\u0652\u06D4\u0670\u0674\u06D5-\u06ED]+)', '', '?????? ??????? ???????????? ??????????')
but it doesn't work. String stays the same. ant suggestion what i do wrong?
This doesn't need to be a real time solution, but are there some log files or system messages that could be read to identify periods of time where someone was connected via RDP to a Windows 7 machine?
I'm building a watchdog script for a computer which will be deployed in a remote place and would like to add this metric to a daily status update.
I'm trying to transfer the contents of one list to another, but it's not working and I don't know why not. My code looks like this:
list1 = [1, 2, 3, 4, 5, 6]
list2 = []
for item in list1:
list2.append(item)
list1.remove(item)
But if I run it my output looks like this:
>>> list1
[2, 4, 6]
>>> list2
[1, 3, 5]
My question is threefold, I guess: Why is this happening, how do I make it work, and am I overlooking an incredibly simple solution like a 'move' statement or something?
I have a list of sublists, and I want to see if any of the integer values from the first sublist plus one are contained in the second sublist. For all such values, I want to see if that value plus one is contained in the third sublist, and so on, proceeding in this fashion across all sublists. If there is a way of proceeding in this fashion from the first sublist to the last sublist, I wish to return True; otherwise I wish to return False. In other words, for each value in sublist one, for each "step" in a "walk" across all sublists read left to right, if that value + n (where n = number of steps taken) is contained in the current sublist, the function should return True; otherwise it should return False. (Sorry for the clumsy phrasing--I'm not sure how to clean up my language without using many more words.)
Here's what I wrote.
a = [ [1,3],[2,4],[3,5],[6],[7] ]
def find_list_traversing_walk(l):
for i in l[0]:
index_position = 0
first_pass = 1
walking_current_path = 1
while walking_current_path == 1:
if first_pass == 1:
first_pass = 0
walking_value = i
if walking_value+1 in l[index_position + 1]:
index_position += 1
walking_value += 1
if index_position+1 == len(l):
print "There is a walk across the sublists for initial value ", walking_value - index_position
return True
else:
walking_current_path = 0
return False
print find_list_traversing_walk(a)
My question is: Have I overlooked something simple here, or will this function return True for all true positives and False for all true negatives? Are there easier ways to accomplish the intended task? I would be grateful for any feedback others can offer!
I want to implement a code that loops inside an array that its size is set by the user that means that the size isn't constant.
for example:
A=[1,2,3,4,5]
then I want the output to be like this:
[1],[2],[3],[4],[5]
[1,2],[1,3],[1,4],[1,5]
[2,3],[2,4],[2,5]
[3,4],[3,5]
[4,5]
[1,2,3],[1,2,4],[1,2,5]
[1,3,4],[1,3,5]
and so on
[1,2,3,4],[1,2,3,5]
[2,3,4,5]
[1,2,3,4,5]
Can you help me implement this code?
I need to write a recursive function that can add two numbers (x, y), assuming y is not negative. I need to do it using two functions which return x-1 and x+1, and I can't use + or - anywhere in the code. I have no idea how to start, any hints?
def maxVote(nLabels):
count = {}
maxList = []
maxCount = 0
for nLabel in nLabels:
if nLabel in count:
count[nLabel] += 1
else:
count[nLabel] = 1
#Check if the count is max
if count[nLabel] > maxCount:
maxCount = count[nLabel]
maxList = [nLabel,]
elif count[nLabel]==maxCount:
maxList.append(nLabel)
return random.choice(maxList)
nLabels contains a list of integers.
The above function returns the integer with highest frequency, if more than one have same frequency then a randomly selected integer from them is returned.
E.g. maxVote([1,3,4,5,5,5,3,12,11]) is 5
using regedit.exe I have manually created a key in registry called
HKEY_CURRENT_USER/00_Just_a_Test_Key
and created two dword values
dword_test_1 and dword_test_2
I am trying to write some values into those two keys using following program
import _winreg
aReg = _winreg.ConnectRegistry(None,_winreg.HKEY_CURRENT_USER)
aKey = _winreg.OpenKey(aReg, r"00_Just_a_Test_Key", 0, _winreg.KEY_WRITE)
_winreg.SetValueEx(aKey,"dword_test_1",0, _winreg.REG_DWORD, 0x0edcba98)
_winreg.SetValueEx(aKey,"dword_test_2",0, _winreg.REG_DWORD, 0xfedcba98)
_winreg.CloseKey(aKey)
_winreg.CloseKey(aReg)
I can write into the first key, dword_test_1, but when I attempt to write into the second, I get following message
Traceback (most recent call last):
File "D:/src/registry/question.py", line 7, in <module>
_winreg.SetValueEx(aKey,"dword_test_2",0, _winreg.REG_DWORD, 0xfedcba98)
ValueError: Could not convert the data to the specified type.
How do I write the second value 0xfedcba98, or any value greater than 0x7fffffff as a dword value?
Originally I was writing script to switch the "My documents" icon on or off by writing "0xf0500174" to hide or "0xf0400174" to display the icon into [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\CLSID{450D8FBA-AD25-11D0-98A8-0800361B1103}\ShellFolder]
Greetings,
I am trying to bin an array of points (x, y) into an array of boxes [(x0, y0), (x1, y0), (x0, y1), (x1, y1)] (tuples are the corner points)
So far I have the following routine:
def isInside(self, point, x0, x1, y0, y1):
pr1 = getProduct(point, (x0, y0), (x1, y0))
if pr1 >= 0:
pr2 = getProduct(point, (x1, y0), (x1, y1))
if pr2 >= 0:
pr3 = getProduct(point, (x1, y1), (x0, y1))
if pr3 >= 0:
pr4 = getProduct(point, (x0, y1), (x0, y0))
if pr4 >= 0:
return True
return False
def getProduct(origin, pointA, pointB):
product = (pointA[0] - origin[0])*(pointB[1] - origin[1]) - (pointB[0] - origin[0])*(pointA[1] - origin[1])
return product
Is there any better way then point-by-point lookup? Maybe some not-obvious numpy routine?
Thank you!
import os
dictionaryfile = "/root/john.txt"
pgpencryptedfile = "helloworld.txt.gpg"
array = open(dictionaryfile).readlines()
for x in array:
x = x.rstrip('\n')
newstring = "echo " + x + " | gpg --passphrase-fd 0 " + pgpencryptedfile
os.popen(newstring)
I need to create something inside the for loop that will read gpg's output. When gpg outputs this string gpg: WARNING: message was not integrity protected, I need the loop to close and print Success!
How can I do this, and what is the reasoning behind it?
Thanks Everyone!
In this quicksort function:
def qsort2(list):
if list == []:
return []
else:
pivot = list[0]
# can't understand the following line
lesser, equal, greater = partition(list[1:], [], [pivot], [])
return qsort2(lesser) + equal + qsort2(greater)
def partition(list, l, e, g):
if list == []:
return (l, e, g)
else:
head = list[0]
if head < e[0]:
return partition(list[1:], l + [head], e, g)
elif head > e[0]:
return partition(list[1:], l, e, g + [head])
else:
return partition(list[1:], l, e + [head], g)
I don't understand the sentence below the comment. Can someone tell me what is the meaning of this sentence here?
I can print a range of numbers easily using range, but is is possible to print a range with 1 decimal place from -10 to 10?
e.g
-10.0, -9.9, -9.8 all they way through to +10?
lst1 = ['one', 2, 3]
// What is the best way of the following -- or is there another way?
lst2 = list(lst1)
lst2 = lst1[:]
import copy
lst2 = copy.copy(lst1)
Hello Hackerz,
Here is the idea
A user can set a day alert for a birthday. (We do not care about the year of birth)
He also picks if he wants to be alerted 0, 1, 2, ou 7 days (Delta) before the D day.
Users have a timezone setting.
I want the server to send the alerts at 8 am on the the D day - deleta +- user timezone
Example:
12 jun, with "alert me 3 days before" will give 9 of Jun.
My idea was to have a trigger_datetime extra field saved on the 'recurrent event' object.
Like this a cron Job running every hour on my server will just check for all events matching irs current time hour, day and month and send to the alert.
The problem from a year to the next the trigger_date could change !
If the alert is set on 1st of March, with a one day delay that could be either 28 or 29 of February ..
Maybe i should not use the trigger date trick and use some other kind of scheme.
All plans are welcome.