Daily Archives

Articles indexed Wednesday March 24 2010

Page 1/131 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Screen scraping software that will traverse pages

- by nilbus

We're creating a mashup site that pulls information from many sources all over the web. Many of these sites don't provide RSS feeds or APIs to access the information they provide. This leaves us with screen scraping as our method for collecting the data. There are many scripting tools out there written in different scripting languages for screen scraping that require you to write scraping scripts in the language the scraper was written in. Scrapy, scrAPI, and scrubyt are a few written in Ruby and Python. There are other web-based tools I've seen like Dapper that create XML or RSS feeds based on a webpage. It has a beautiful web-based interface that requires no scripting skills to use. This would be a great tool, if it were able to traverse multiple pages to gather data from hundreds pages of results. We need something that will scrape information from paginated web sites, much like scrubyt, but with a user interface that a non-programmer could use. We'll script up our own solution if we need to, probably using scrubyt, but if there's a better solution out there, we want to use it. Does anything like this exist?

Read the article
Does "Noctua NH-U12-DX 1366" mount on Asus p6t 1366?

- by Andrea Ambu

On Noctua site they state: Caution: The NH-U12DX 1366 can only be used on mainboards that have a backplate with screw threads for CPU cooler installation (such as the Intel reference backplate for Xeon 5500). The cooler is thus incompatible with Xeon 3500 and Core i7 mainboards that don’t have such a backplate. How do I know if Asus p6t has it?

Read the article
Why allow concatenation of string literals?

- by Caspin

I recently got bit by a subtle bug. char ** int2str = { "zero", // 0 "one", // 1 "two" // 2 "three",// 3 nullptr }; assert( values[1] == "one"_s ); // passes assert( values[2] == "two"_s ); // fails If you have godlike code review powers you'll notice I forgot the , after "two". After the considerable effort to find that bug I've got to ask why would anyone ever want this behavior? I can see how this might be useful for macro magic, but then why is this a "feature" in a modern language like python? Have you ever used string literal concatenation in production code?

Read the article
[JavaScript-CSS-Firefox] Cannot change borderColor of TD

- by Tadeus Prastowo

Using JS to set the background color of a TD is fine. But, setting the border color is problematic in FF 3.0.18 although IE 6 doesn't experience this. FF is problematic in that it requires the TD element to have an attribute style initialized to border-style: solid. Without that, setting border color of a TD won't work. Is this known bug? How do I set the border color without having to set style attribute as well as the initialization value? I know another trick of setting the class attribute instead of setting the border color directly. Is this an indication that somehow TD hates having its border color set dynamically? Is this known as well? The problematic code is below (the goal is find out why setting the border color of simple truth 1 does not work while simple truth 3 works when I employ the trick described above): <html> <head> <title>Quirks FF 3.0.18</title> <style type="text/css"> table { border-collapse: collapse; } </style> <script type="text/javascript"> function changeBgColor() { document.getElementById('simple').style.backgroundColor='yellow'; document.getElementById('simple2').style.backgroundColor='yellow'; document.getElementById('simple3').style.backgroundColor='yellow'; } function quirk(id) { var x = document.getElementById(id); x.style.border = '2px solid red'; } </script> </head> <body> <input type="button" onclick="changeBgColor()" value="Change background color"/> <input type="button" onclick="quirk('simple')" value="Change border color 1"/> <input type="button" onclick="quirk('simple2')" value="Change border color 2"/> <input type="button" onclick="quirk('simple3')" value="Change border color 3"/> <table> <tr><td id="simple">Simple truth 1</td></tr> </table> <table> <tr><td><span id="simple2">Simple truth 2</span></td></tr> <table> <tr><td id="simple3" style="border-style: solid">Simple truth 3</td></tr> </table> </body> </html>

Read the article
How can I make www.mywebapp.com/bin return a 404 in ASP.NET MVC?

- by Freewalker

I'm using ASP.NET MVC to develop a web application, deploying to IIS 7. I've hidden my Files and Views directories with web.config files in those directories (they just return a normal 404). However, I haven't been able to get the web.config method to work in hiding my bin directory. When I access www.mywebapp.com/bin, I instead get a too-revealing page with this message: HTTP Error 404.8 - Not Found The request filtering module is configured to deny a path in the URL that contains a hiddenSegment section. The page reveals part of my directory structure. I just want it to return my 404 page like the Files and Views directories do. How can I get this behavior?

Read the article
Looping to provide multiple lines in linechart (django-googlecharts)

- by mighty_bombero

Hi, I'm trying to generate some charts using django-googlecharts. This works fine for rather static data but in one case I would like to render a different number of lines, based on a variable. I tried this: {% chart %} {% for line in line_data %} {% chart-data line %} {% endfor %} {% chart-size "390x200" %} {% chart-type "line" %} {% chart-labels days %} {% endchart %} Line data is a list containing lists. The template code fails with "Caught an exception while rendering: max() arg is an empty sequence". I guess the problem is that I try to loop over templatetags. What approach could be used here? Or am I completely missing something? Is this doable using inclusion tags? Thanks for your help.

Read the article
Word Spell Check pops up hidden and "freezes" my App

- by Refracted Paladin

I am using Word's Spell Check in my in house WinForm app. My clients are all XP machines with Office 2007 and randomly the spell check suggestion box pops up behind the App and makes everything "appear" frozen as you cannot get at it. Suggestions? What do other people do to work around this or stop it altogether? Thanks Below is my code, for reference, though I am doubtful that this has anything to do with my code but I'll take anything. public class SpellCheckers { public string CheckSpelling(string text) { Word.Application app = new Word.Application(); object nullobj = Missing.Value; object template = Missing.Value; object newTemplate = Missing.Value; object documentType = Missing.Value; object visible = false; object optional = Missing.Value; object savechanges = false; app.ShowMe(); Word._Document doc = app.Documents.Add(ref template, ref newTemplate, ref documentType, ref visible); doc.Words.First.InsertBefore(text); Word.ProofreadingErrors errors = doc.SpellingErrors; var ecount = errors.Count; doc.CheckSpelling(ref optional, ref optional, ref optional, ref optional, ref optional, ref optional, ref optional, ref optional, ref optional, ref optional, ref optional, ref optional); object first = 0; object last = doc.Characters.Count - 1; var results = doc.Range(ref first, ref last).Text; doc.Close(ref savechanges, ref nullobj, ref nullobj); app.Quit(ref savechanges, ref nullobj, ref nullobj); Marshal.ReleaseComObject(doc); Marshal.ReleaseComObject(app); Marshal.ReleaseComObject(errors); return results; } } And I call it from my WinForm app like so -- public static void SpellCheckControl(Control control) { if (IsWord2007Available()) { if (control.HasChildren) { foreach (Control ctrl in control.Controls) { SpellCheckControl(ctrl); } } if (IsValidSpellCheckControl(control)) { if (control.Text != String.Empty) { control.BackColor = Color.FromArgb(180, 215, 195); control.Text = Spelling.CheckSpelling(control.Text); control.Text = control.Text.Replace("\r", "\r\n"); control.ResetBackColor(); } } } }

Read the article
multiple key ranges as parameters to a couchdb view

- by kolosy

is there a way to send multiple startKey/endKey pairs to a view, akin to the keys: [] array that can be posted for keys? the underlying problem - let's say my documents have "categories" and timestamps. if i want all documents in the "foo" category that have a timestamp that's within the last two hours, it's simple: function (doc) { emit([doc.category, doc.timestamp], null); } and then query as GET server:5894/.../myview?startKey=[foo, |now - 2 hours|]&endkey=[foo, |now|] the problem comes when i want something in categories foo or bar, within the last two hours. if i didn't care about time, i could just pull directly by key through the keys collection. unfortunately, i have no such option with ranges. what i ended up doing in the meantime is rounding the timestamp to two-hour blocks, and then multiplexing the query out: POST server:5894/.../myview keys=[[foo, 0 hours], [foo, 2 hours], [bar, 0 hours], [bar, 2 hours]] it works, but will get messy if i want to go back a large amount of time (in relationship to the blocksize)

Read the article
How expensive is it to create an NSAutoreleasePool

- by morgancodes

I have a method which needs to run in its own thread 88 times per second (it's a callback for an audio unit.) Should I avoid creating an NSAutoreleasePool each time it's called?

Read the article
Sites with 1-column css styles

- by user300413

where i may found simple 1-column css styles? EDIT: templates like it - http://mashable.com/2007/09/13/one-column-website-templates/ but another sorry my english EDIT2: bignose, ok, thanks

Read the article
Passing by reference in Java?

- by Mike

In C++, if you need to have 2 objects modified, you can pass by reference. How do you accomplish this in java? Assume the 2 objects are primitive types such as int.

Read the article
[R] Merge multiple data frames - Error in match.names(clabs, names(xi)) : names do not match previou

- by Jasmine

Hi all- I'm getting some really bizarre stuff while trying to merge multiple data frames. Help! I need to merge a bunch of data frames by the columns 'RID' and 'VISCODE'. Here is an example of what it looks like: d1 = data.frame(ID = sample(9, 1:100), RID = c(2, 5, 7, 9, 12), VISCODE = rep('bl', 5), value1 = rep(16, 5)) d2 = data.frame(ID = sample(9, 1:100), RID = c(2, 2, 2, 5, 5, 5, 7, 7, 7), VISCODE = rep(c('bl', 'm06', 'm12'), 3), value2 = rep(100, 9)) d3 = data.frame(ID = sample(9, 1:100), RID = c(2, 2, 2, 5, 5, 5, 9,9,9), VISCODE = rep(c('bl', 'm06', 'm12'), 3), value3 = rep("a", 9), values3.5 = rep("c", 9)) d4 = data.frame(ID =sample(8, 1:100), RID = c(2, 2, 5, 5, 5, 7, 7, 7, 9), VISCODE = c(c('bl', 'm12'), rep(c('bl', 'm06', 'm12'), 2), 'bl'), value4 = rep("b", 9)) dataList = list(d1, d2, d3, d4) I looked at the answers to the question titled "Merge several data.frames into one data.frame with a loop." I used the reduce method suggested there as well as a loop I wrote: try1 = mymerge(dataList) try2 <- Reduce(function(x, y) merge(x, y, all= TRUE, by=c("RID", "VISCODE")), dataList, accumulate=F) where dataList is a list of data frames and mymerge is: mymerge = function(dataList){ L = length(dataList) mdat = dataList[[1]] for(i in 2:L){ mdat = merge(mdat, dataList[[i]], by.x = c("RID", "VISCODE"), by.y = c("RID", "VISCODE"), all = TRUE) } mdat } For my test data and subsets of my real data, both of these work fine and produce exactly the same results. However, when I use larger subsets of my data, they both break down and give me the following error: Error in match.names(clabs, names(xi)) : names do not match previous names. The really weird thing is that using this works: dataList = list(demog[1:50,], neurobat[1:50,], apoe[1:50,], mmse[1:50,], faq[1:47, ]) And using this fails: dataList = list(demog[1:50,], neurobat[1:50,], apoe[1:50,], mmse[1:50,], faq[1:48, ]) As far as I can tell, there is nothing special about row 48 of faq. Likewise, using this works: dataList = list(demog[1:50,], neurobat[1:50,], apoe[1:50,], mmse[1:50,], pdx[1:47, ]) And using this fails: dataList = list(demog[1:50,], neurobat[1:50,], apoe[1:50,], mmse[1:50,], pdx[1:48, ]) Row 48 in faq and row 48 in pdx have the same values for RID and VISCODE, the same value for EXAMDATE (something I'm not matching on) and different values for ID (another thing I'm not matching on). Besides the matching RID and VISCODE, I see anything special about them. They don't share any other variable names. This same scenario occurs elsewhere in the data without problems. To add icing on the complication cake, this doesn't even work: dataList = list(demog[1:50,], neurobat[1:50,], apoe[1:50,], mmse[1:50,], faq[1:48, 2:3]) where columns 2 and 3 are "RID" and "VISCODE". 48 isn't even the magic number because this works: dataList = list(demog[1:500,], neurobat[1:500,], apoe[1:500,], mmse[1:457,]) while using mmse[1:458, ] fails. I can't seem to come up with test data that causes the problem. Has anyone had this problem before? Any better ideas on how to merge? Thanks for your help! Jasmine

Read the article
Join with three tables

- by John

Hello, For the join query below, I would like to pull some data from a third MySQL table called "comment." Each s.title has a corresponding s.submissionid. The field "submissionid" is also the in the table "comment." For each "submissionid" in the table "comment," I would like to count a field called "commentid." How can I do this? Thanks in advance, John $sqlStr = "SELECT s.loginid, s.title, s.url, s.displayurl, l.username FROM submission AS s, login AS l WHERE s.loginid = l.loginid ORDER BY s.datesubmitted DESC LIMIT 10";

Read the article
Jboss logging issue

- by balaji

Our application is deployed on JBoss As 4.0x and we face some issues with JBoss logging. Whenever the server is restarted, JBoss stops logging, and there is no update in server.log. After that it is not updating the log file. Then we do touch cmd on log4j.xml, so that it creates the log files again. Please help me in fixing the issue we cant do touch everytime. We face this issue in both the nodes. I could not figure where the problem is? If any other issues, we can check the log files. If log itself is not getting updated/logged, how can we move further in analyzing the issues without the recent/updated logs? Contents of log4j.xml, copied from the comments below: <appender name="FILE" class="org.jboss.logging.appender.DailyRollingFileAppender"> <errorHandler class="org.jboss.logging.util.OnlyOnceErrorHandler"/> <param name="File" value="${jboss.server.log.dir}/server.log"/> <param name="Append" value="false"/> <param name="DatePattern" value="'.'yyyy-MM-dd"/> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="%d %-5p [%c] %m%n"/> </layout> </appender> <appender name="CONSOLE" class="org.apache.log4j.ConsoleAppender"> <errorHandler class="org.jboss.logging.util.OnlyOnceErrorHandler"/> <param name="Target" value="System.out"/> <param name="Threshold" value="INFO"/> <layout class="org.apache.log4j.PatternLayout">  <param name="ConversionPattern" value="%d{ABSOLUTE} %-5p [%c{1}] %m%n"/> </layout> </appender> <root> <appender-ref ref="CONSOLE"/> <appender-ref ref="FILE"/> </root> <category name="org.apache"> <priority value="INFO"/> </category> <category name="org.apache.axis"> <priority value="INFO"/> </category> <category name="org.jgroups"> <priority value="WARN"/> </category> <category name="jacorb"> <priority value="WARN"/> </category> <category name="org.jboss.management"> <priority value="INFO"/> </category>

Read the article
Java: ArrayList bottleneck

- by Jack

Hello, while profiling a java application that calculates hierarchical clustering of thousands of elements I realized that ArrayList.get occupies like half of the CPU needed in the clusterization part of the execution. The algorithm searches the two more similar elements (so it is O(n*(n+1)/2) ), here's the pseudo code: int currentMax = 0.0f for (int i = 0 to n) for (int j = i to n) get content i-th and j-th if their similarity > currentMax update currentMax merge the two clusters So effectively there are a lot of ArrayList.get involved. Is there a faster way? I though that since ArrayList should be a linear array of references it should be the quickest way and maybe I can't do anything since there are simple too many gets.. but maybe I'm wrong. I don't think using a HashMap could work since I need to get them all on every iteration and map.values() should be backed by an ArrayList anyway.. Otherwise should I try other collection libraries that are more optimized? Like google's one, or apache one.. Thanks

Read the article
Using Gallio/Mbunit with TFS 2010 Team Build

- by David Gardiner

How do you configure a Team Build 2010 build process template to run MbUnit tests via Gallio?

Read the article
C# - Repeating a method call using timers

- by Jeremy Rudd

In a VSTO add-in I'm developing, I need to execute a method with a specific delay. The tricky part is that the method may take anywhere from 0.1 sec to 1 sec to execute. I'm currently using a System.Timers.Timer like this: private Timer tmrRecalc = new Timer(); // tmrRecalc.Interval = 500 milliseconds private void tmrRecalc_Elapsed(object sender, System.Timers.ElapsedEventArgs e){ // stop the timer, do the task tmrRecalc.Stop(); Calc.recalcAll(); // restart the timer to repeat after 500 ms tmrRecalc.Start(); } Which basically starts, raises 1 elapse event after which it is stopped for the arbitrary length task is executed. But the UI thread seems to hang up for 3-5 seconds between each task. Do Timers have a 'warm-up' time to start? Is that why it takes so long for its first (and last) elapse? Which type of timer do I use instead?

Read the article
Ajaxing a link in a table

- by Colin Desmond

I have a table of results in an ASP.Net MVC page where the last column is an View Details link. I want to have the user click the View Details link and an AJAX method be called to open the results in floating dialog. What I am struggling with is how to link the AJAX call to the link in the results table. I was using a link which embedded the ~/ControllerName/ViewDetails/InstanceId link directly in it. Clicking it took the user to a new page and it is this behaviour I want to replace with an AJAX call and a dialog window. Now I want to attach a jQuery handler to the link to trigger the AJAX call and I can't see how to do this other than write an jQuery handler for each row in the results table. There must be a way to mark the link as an ViewDetails link (using a class?) and attach the jQuery click handler to all instances of type class ViewDetails.

Read the article
jquery livequery event triggered on EVERYTHING, not just selected element.

- by phazei

I am attempting to use livequery. I unfortunately am stuck using jquery 1.2.6. This is my code: $(document).ready(function() { $('a.sort').livequery('click', function(event) { alert('hello'); }); }); If I click ANYWHERE in the document, I get the alert 'hello'. What exactly is wrong there? Is it some bug with jQ1.2.6 and livequery 1.1.1? This same question was asked here but the question wasn't clear, and the answer didn't help.

Read the article
I'm getting a "Does not implement IController" error on images and robots.txt in MVC2

- by blesh

I'm getting a strange error on my webserver for seemingly every file but the .aspx files. Here is an example. Just replace '/robots.txt' with any .jpg name or .gif or whatever and you'll get the idea: The controller for path '/robots.txt' was not found or does not implement IController. I'm sure it's something to do with how I've setup routing but I'm not sure what exactly I need to do about it. Also, this is a mixed MVC and WebForms site, if that makes a difference.

Read the article
Git subtree not properly using .gitignore when doing a partial clone

- by D W

I am a graduate student with many scripts, bibliography data in bibtex, thesis draft in latex, presentations in open office, posters in scribus, and figures and result data. I would like to put everything in one project under version control. Then when I need to work on a portion such as the bibliography data, I would like to check that subdirectory out, modify it as necessary and merge it back.I would like the ability to check out one version to my home computer, and a different one to my work computer and make changes to each independently and eventually merge them back. I would also like to be able to check out a piece of code from this big project and import it with versioning into a separate project. If I may changes I'd like to be able to merge them back to the original project. Based on my understanding git subtree can do this. http://github.com/apenwarr/git-subtree There is an example that is along the lines of what I'm trying to do at: http://psionides.jogger.pl/2010/02/04/sharing-code-between-projects-with-git-subtree/ Say the trunk of my project contained the directories: (bib bin cfg data fig src todo). When I use git subtree split -P bib -b export git checkout export I get a the bib directory, plus all files that should have been ignored or considered binary based on .gitignore such as the src directory and everything in it that ends in a tilde or the ./data directory. dwickrama@DWwork:~/research/trunk$ ls * -r biblography.bib JabRef src: script1.sh~ README~ script2.sh~ script3.sh~ script4.R~ script5.awk~ script5.py~ cfg: cfgFile1.ini~ cfgFile2.ini~ cfgFile3.ini~ bin: bigBinaryPackage1 bigBinaryPackage2 dwickrama@DWwork:~/research/trunk$ My .gitignore file is as follows: *.doc diff=word *.tex diff=tex *.bib diff=bibtex *.py diff=python *.eps binary *.jpg binary *.png binary ./bin/* binary *~ How do I prevent this?

Read the article
How can I walk through two files simultaneously in Perl?

- by Alex Reynolds

I have two text files that contain columnar data of the variety position-value, sorted by position. Here is an example of the first file (file A): 100 1 101 1 102 0 103 2 104 1 ... Here is an example of the second file (B): 20 0 21 0 ... 100 2 101 1 192 3 193 1 ... Instead of reading one of the two files into a hash table, which is prohibitive due to memory constraints, what I would like to do is walk through two files simultaneously, in a stepwise fashion. What this means is that I would like to stream through lines of either A or B and compare position values. If the two positions are equal, then I perform a calculation on the values associated with that position. Otherwise, if the positions are not equal, I move through lines of file A or file B until the positions are equal (when I again perform my calculation) or I reach EOF of both files. Is there a way to do this in Perl?

Read the article
Data Web Controls Enhancements in ASP.NET 4.0

Traditionally, developers using Web controls enjoyed increased productivity but at the cost of control over the rendered markup. For instance, many ASP.NET controls automatically wrap their content in <table> for layout or styling purposes. This behavior runs counter to the web standards that have evolved over the past several years, which favor cleaner, terser HTML; sparing use of tables; and Cascading Style Sheets (CSS) for layout and styling. Furthermore, the <table> elements and other automatically-added content makes it harder to both style the Web controls using CSS and to work with the controls from client-side script. One of the aims of ASP.NET version 4.0 is to give Web Form developers greater control over the markup rendered by Web controls. Last week's article, Take Control Of Web Control ClientID Values in ASP.NET 4.0, highlighted how new properties in ASP.NET 4.0 give the developer more say over how a Web control's ID property is translated into a client-side id attribute. In addition to these ClientID-related properties, many Web controls in ASP.NET 4.0 include properties that allow the page developer to instruct the control to not emit extraneous markup, or to use an HTML element other than <table>. This article explores a number of enhancements made to the data Web controls in ASP.NET 4.0. As you'll see, most of these enhancements give the developer greater control over the rendered markup. Read on to learn more! Read More >

Read the article
Linq, double left join and double count

- by Fabian Vilers

Hi! I'm looking to translate this SQL statement to a well working & performant LINQ command. I've managed to have the first count working using the grouping count and key members, but don't know how to get the second count. select main.title, count(details.id) as details, count(messages.id) as messages from main left outer join details on main.id = details.mainid left outer join messages on details.id = messages.detailid group by main.title Any advice is welcome! Fabian

Read the article
Can URIs have non-ASCII characters?

- by Cheeso

I tried to find this in the relevant RFC, IETF RFC 3986, but couldn't figure it. Do URIs for HTTP allow Unicode, or non-ASCII of any kind? Can you please cite the section and the RFC that supports your answer. NB: For those who might think this is not programming related - it is. It's related to an ISAPI filter I'm building.

Read the article

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >