Search Results

Search found 346 results on 14 pages for 'scraping'.

Page 12/14 | < Previous Page | 8 9 10 11 12 13 14 | Next Page >

Finding if a sentence contains a specific phrase in Ruby

- by TenJack

Right now I am seeing if a sentence contains a specific word by splitting the sentence into an array and then doing an include to see if it contains the word. Something like: "This is my awesome sentence.".split(" ").include?('awesome') But I'm wondering what the fastest way to do this with a phrase is. Like if I wanted to see if the sentence "This is my awesome sentence." contains the phrase "my awesome sentence". I am scraping sentences and comparing a very large number of phrases, so speed is somewhat important.

Read the article
file_get_contents returns 403 forbidden

- by absk

I am trying to make a sitescraper. I made it on my local machine and it works very fine there. When I execute the same on my server, it shows a 403 forbidden error. I am using the PHP Simple HTML DOM Parser. The error I get on the server is this: Warning: file_get_contents(http://example.com/viewProperty.html?id=7715888) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /home/scraping/simple_html_dom.php on line 40 The line of code triggering it is: $url="http://www.example.com/viewProperty.html?id=".$id; $html=file_get_html($url); I have checked the php.ini on the server and allow_url_fopen is On. Possible solution can be using curl, but I need to know where I am going wrong.

Read the article
Determine phone number based on time zone?

- by Zachary Burt

I have a (potentially international) phone number. It may or may not have a country code prefix. Does anyone know of a database that will help me map phone number = time zone? I would even just like to map phone number = country, since I can probably create country = time zone by scraping existing data on the web. This is a more complicated problem than it looks; for example, how do I know if it's a US-based number -- e.g. is it a USA area code, or an international country calling code? Any language is fine; I can port it.

Read the article
do people value information or aesthetic value of websites ? [closed]

- by fwfwfw

I'm thinking, why does the web have to be so colorful. meaning, all the information is buried deep beneath layers of flash, javascripts, html and images. Sure, a good positioning of these media files, create an aesthetic value but how important is it to the user ? moreover, aren't people looking for information after all ? why can't the internet be a uniform looking data warehouse ? now we've gotta digg through all the aesthetic junk, using shady web scraping techniques, unless RSS or API is provided. why can't we settle for just a dull grey button and framesets for navigation ? why can't all sites have navigation frame on the left and top ? why can't all sites put their damn data always in normalized table tag ?

Read the article
Should I use a huge composite primary key or just a unique id?

- by Jack

I have been trying to do web scraping of a particular site and storing the results in a database. My original assumptions about the data allowed a schema where I could use fairly reasonable composite primary keys (usually containing only 2 or 3 fields) but as time went on, I realized that my original assumptions about the data were wrong and my primary keys were not as unique as I thought they were, so I have slowly been expanding them to contain more and more fields. In fact, I have recently come to believe that their database has no constraints whatsoever. Just today, I have finally expanded my a primary key for one of my tables to contain every field in that table and I thought now would be a good time to ask: is it better to add an auto-incrementing column that is just a unique id or just leave a composite primary key on the entire table?

Read the article
escaping query string with special characters with python

- by that_guy

I got some pretty messy urls that i got via scraping here, problem is that they contain spaces or other special characters in the path and query string, here is some example http://www.example.com/some path/to the/file.html http://www.example.com/some path/?file=path to/file name.png&name=name.me so, is there an easy and robust way to escape the urls so that i can pass them to urlopen? i tried urlib.quote, but it seems to escape the '?', '&', and '=' in the query string as well, and it seems to escape the protocol as well, currently, what i am trying to do is use regex to separate the protocol, path name, and query string and escape them separately, but there are cases where they arent separated properly any advice is appreciated

Read the article
How can I match everything in a string until the second occurrence of a delimiter with a regular expression?

- by Steve

I am trying to refine a preg_match_all by finding the second occurrence of a period then a space: <?php $str = "East Winds 20 knots. Gusts to 25 knots. Waters a moderate chop. Slight chance of showers."; preg_match_all ('/(^)((.|\n)+?)(\.\s{2})/',$str, $matches); $dataarray=$matches[2]; foreach ($dataarray as $value) { echo $value; } ?> But it does not work: the {2} occurrence is incorrect. I have to use preg_match_all because I am scraping dynamic HTML. I want to capture this from the string: East Winds 20 knots. Gusts to 25 knots.

Read the article
Silverlight Cream for May 04, 2010 -- #855

- by Dave Campbell

In this Issue: John Papa, Adam Kinney, Mike Taulty, Kirupa, Gunnar Peipman, Mike Snow(-2-, -3-), Jesse Liberty, and Lee. Shoutout: Jeff Wilcox announced Silverlight Unit Test Framework: New version in the April 2010 Silverlight Toolkit From SilverlightCream.com: Silverlight TV 23: MVP Q&A with WWW (Wildermuth, Wahlin and Ward) John Papa has Silverlight 23 up which is a panel discussion between Shawn Wildermuth, Dan Wahlin, Ward Bell and John... wow... what a crew! Design-time Resources in Expression Blend 4 RC Adam Kinney reports on the new feature of Expresseion Blend RC to load resources at design time. Adam also has a project available to demonstrate the concepts he's explaining. Silverlight and WCF RIA Services (1 - Overview) Mike Taulty is starting a series on WCF RIA Services. This first one is an overview and looks to be a good series as expected. Introduction to Sample Data - Page 1 Kirupa has a great 5-part post up about sample data in Expression Blend. Windows Phone 7 development: Using WebBrowser control Gunnar Peipman posted about using the web browser control in WP7 to display RSS data. Good stuff, and all the code too. Silverlight Tip of the Day #10 – Converting Client IP to Geographical Location Mike Snow's Tip #10 is about taking an IP address and getting a geographical location from it. Combine this with his Tip #9 that retrieves the IP address. Silverlight Tip of the Day #11 – Deploying Silverlight Applications with WCF web services. Mike Snow's Tip #11 is much bigger than most ... it's almost an end-to-end solution for creating and deploying a WCF service, including resolving problems. Silverlight Tip of the Day #12 – Getting an Images Source File Name Mike Snow also has tip #12 up, and it's a quick one on getting the original source file name for an image you've loaded. Screen Scraping – When All You Have Is A Hammer… Jesse Liberty posted his solution to a self-imposed problem and ended up writing a 'mini tutorial on using Silverlight for creating desk-top utilities' ... all with source. RIA services and combobox lookups Lee has a post up about RIA Services and setting up comboboxes for lookups. Lots of source in the post and full project download. Stay in the 'Light! Twitter SilverlightNews | Twitter WynApse | WynApse.com | Tagged Posts | SilverlightCream Join me @ SilverlightCream | Phoenix Silverlight User Group Technorati Tags: Silverlight Silverlight 3 Silverlight 4 Windows Phone MIX10

Read the article
Can't access IIS 7 server URL from the same IIS 7 server.

- by Kevin Raffay

We have an intranet site ie, xxx.yyyy.com, that users access by entering "http"://xxx.yyy.com. Our problems started when we migrated to IIS 7 running on a new 2003 server. We got rid of our single-sign on code and implemented a security model where we capture a user's domain credentials which we then authenticate against a DB. In order to get the domain credentials passed to our ASP.NET app, we have the following settings: Anonymous Authentication:Disabled ASP.NET Impersonation: Enabled Basic/Digest/Forms Authentication: Disabled Windows Authentication: Enabled We allow "*" and deny "?" in the web.config. Browsing "http"://xxx.yyy.com from any client PC results in a domain login prompt, and if your enter a proper user/pwd, you can get in. However, browsing "http"://xxx.yyy.com while remoting into the server results in 3 domain login prompts and eventually a 401 error - unauthorized. We have traced this behavior to problems with our web site where we have pages doing "screen scraping" using the HttpRequest calling a url on the same server. When doing a HttpRequest from any other client, using a test harness that passes authorized credentials, all is good. So internal HttpRequest calls on the server fail, just like attempts to browse that server's url from within a remote session. Why would a to "http"://xxx.yyy.com on server xxx.yyy.com fail authentication?

Read the article
Finding all IP ranges blelonging to a specific ISP

- by Jim Jim

I'm having an issue with a certain individual who keeps scraping my site in an aggressive manner; wasting bandwidth and CPU resources. I've already implemented a system which tails my web server access logs, adds each new IP to a database, keeps track of the number of requests made from that IP, and then, if the same IP goes over a certain threshold of requests within a certain time period, it's blocked via iptables. It may sound elaborate, but as far as I know, there exists no pre-made solution designed to limit a certain IP to a certain amount of bandwidth/requests. This works fine for most crawlers, but an extremely persistent individual is getting a new IP from his/her ISP pool each time they're blocked. I would like to block the ISP entirely, but don't know how to go about it. Doing a whois on a few sample IPs, I can see that they all share the same "netname", "mnt-by", and "origin/AS". Is there a way I can query the ARIN/RIPE database for all subnets using the same mnt-by/AS/netname? If not, how else could I go about getting every IP belonging to this ISP? Thanks.

Read the article
Can I use Outlook 2010 (beta) with OWA account?

- by Dan

One of the new features of Outlook 2010 (beta) is the support for multiple Exchange accounts. I'm wondering if there is any way to use this together with a (different) Outlook Web Access account to also get that email in Outlook. Specifially, in additional to my regular corporate (Exchange) account, I also use another corporate account through OWA. With this second account, the only supported access is through OWA; while POP3 access is available, it is not actually suported. I'm not very familiar with configuring Exchange servers, but in talking to those who are, it sounds like enabling Outlook Web Access is (slightly) different than allowing access from Outlook via HTTP(s). Is that correct? If so, it doesn't really semm quite right as absolute worst-case, one could (theoretically) resort to screen-scraping OWA. Edit: this looks to be about the same as Activesync/OWA Desktop Client? (This doesn't have anything to do with the question, but I'm actually using this second corporate account in Outlook by POP3'ing to Gmail, and then IMAP4 from Gmail to Outlook. Obviously, it would be much nicer to add it as a second Exchange account.).

Read the article
Can't Get Mac Mini to turn on - no screen, no beep, only the fan, power light, and optical drive noi

- by pibyers

I have an Intel-based Mac mini mid-2007 (Model A1176). This is the computer my kids use so I don't use it regularly. The computer had been working fine until one day my kids told me that it no longer works. The computer will not boot up. When I turn it on the fan turns, the white power light in the front turns on, and there is a sound that appears to be from the optical drive (rather than hard drive). I don't get anything to the monitor, nor do I get any dings or other start up sounds from the computer. Here is what I've tried thus far to no avail: 1) Swapped out the monitors early on since I figured that was my weak link - no change 2) Reset the PMU - no change 3) Tried to boot up from the System Disk - The mini loaded the dvd into the drive, but nothing else (I can't eject the disk so I can put it back) 3) Start up the computer in target mode connected to another mac - I tried this too, but I never received a chime or the disk show up on the other mac. I'm about out of ideas apart from scraping the computer. Does anyone have any ideas that I can try? Again, nothing has been done to the computer in at least 6 months when I upgraded the RAM. I'm also still on Leopard. Thanks.

Read the article
Disk Error on Boot (Possible boot sector issue)

- by Choco

I own a 4-5 year old Dell Dimension E510 with Windows XP: Media Center Edition. I have 2 drives installed: C Drive: Windows XP: Media Center Edition G Drive: 2 partitions: Windows 7 (beta) Windows XP (professional) That is also the order they are connected. The C Drive is my primary drive. When I attempt to boot the computer, the bios loading screen appears normally; the progress bar moves and it's fine. The very next page, however, supposed to be a boot choice. When I installed Windows 7 onto the G Drive in context of the C drive it added a boot selector to the C drive's boot sequence. It gives me the option of booting Windows 7 or Windows XP: Media Center Edition. However, my problem is now this: After the bios screen I previously mentioned, instead of a boot selector, I receive the following error: A disk read error occurred. Press CTRL+ALT+DEL to restart. The drive is spinning up normally. I hear no odd noises/clicks/scraping coming from it, even after disabling the other drive to listen to it carefully. According to me, it's a boot sector issue. I have never experienced this before, but maybe during a recent shutdown, Windows XP: MCE errored out and ruined the boot sector. Dilemma! I don't have the Windows XP: MCE disc, because it was installed by the factory. I have accessed the hidden partition on the drive before (you hit a key combination on the bios screen and it comes up with an interface to fix your drive). However, I don't want to reformat the drive (which is what the interface gives me the option to do). I want to possibly fix the boot sector. How can I achieve that?

Read the article
iptables, blocking large numbers of IP Addresses

- by Twirrim

I'm looking to block IP addresses in a relatively automated fashion if they look to be 'screen scraping' content from websites that we host. In the past this was achieved by some ingenious perl scripts and OpenBSD's pf. pf is great in that you can provide it nice tables of IP addresses and it will efficiently handle blocking based on them. However for various reasons (before my time) they made the decision to switch to CentOS. iptables doesn't natively provide the ability to block large numbers of addresses (I'm told it wasn't unusual to be blocking 5000+), and I'm a bit cautious over adding that many rules into an iptable. ipt_recent would be awesome for doing this, plus it provides a lot of flexibility for just severely slowing down access, but there is a bug in the CentOS kernel that is stopping me from using it (reported, but awaiting fix). Using ipset would entail compiling a more up-to-date version of iptables than comes with CentOS which whilst I'm perfectly capable of doing it, I'd rather not do from a patching, security and consistency perspective. Other than those two it looks like nfblock is a reasonable alternative. Is anyone aware of other ways of achieving this? Are my concerns about several thousand IP addresses in iptables as individual rules unfounded?

Read the article
iptables, blocking large numbers of IP Addresses

- by Twirrim

I'm looking to block IP addresses in a relatively automated fashion if they look to be 'screen scraping' content from websites that we host. In the past this was achieved by some ingenious perl scripts and OpenBSD's pf. pf is great in that you can provide it nice tables of IP addresses and it will efficiently handle blocking based on them. However for various reasons (before my time) they made the decision to switch to CentOS. iptables doesn't natively provide the ability to block large numbers of addresses (I'm told it wasn't unusual to be blocking 5000+), and I'm a bit cautious over adding that many rules into an iptable. ipt_recent would be awesome for doing this, plus it provides a lot of flexibility for just severely slowing down access, but there is a bug in the CentOS kernel that is stopping me from using it (reported, but awaiting fix). Using ipset would entail compiling a more up-to-date version of iptables than comes with CentOS which whilst I'm perfectly capable of doing it, I'd rather not do from a patching, security and consistency perspective. Other than those two it looks like nfblock is a reasonable alternative. Is anyone aware of other ways of achieving this? Are my concerns about several thousand IP addresses in iptables as individual rules unfounded?

Read the article
Monitoring instantaneous network throughput at one second intervals?

- by Shaddi

For a testing setup I have, I need to monitor the throughput through a "router"* at regular intervals of around 5 seconds or less (sub-second intervals would be very nice, but not required). Ideally, I would be able to generate a file which contained both the number of bytes and packets seen during each interval. I will eventually be generating a time-series of throughput from this data. On a previous setup using an older version of FreeBSD, there was a tool called "bpfmon" which gave me this information. However, I need to do this under a modern version of Linux (namely, Ubuntu 11.04). I have looked at both iptraf and iftop, but these do not appear to provide the resolution I need, nor do they seem to easily allow scraping the data I need. I understand iptables statistics may be able to give me what I'm after, but the examples I've seen of this seem to rely on repeatedly reading and resetting traffic counters, which seems like it could give inaccurate as read/reset is not an atomic operation. I already capture a tcpdump trace of the traffic I'm interested in on the link I want to monitor, so I am open to approaches which simply parse that. I feel like this must be a common problem though, so I am hoping there will be a standard "best practice" tool for accomplishing this. *I say "router" in quotes because I am really talking about a machine with two bridged NICs through which all the traffic I'm interested in passes.

Read the article
Convert a (nested)HTML unordered list of links to PHP array of links

- by Klark

Hi, I have a regular, nested HTML unordered list of links, and I'd like to scrape it with PHP and convert it to an array. The original list looks something like this: <ul> <li><a href="http://someurl.com">First item</a> <ul> <li><a href="http://someotherurl.com/">Child of First Item</a></li> <li><a href="http://someotherurl.com/">Second Child of First Item</a></li> </ul> </li> <li><a href="http://bogusurl.com">Second item</a></li> <li><a href="http://bogusurl.com">Third item</a></li> <li><a href="http://bogusurl.com">Fourth item</a></li> </ul> Any of the items can have children. (The actual screen scraping is not a problem, I can do that.) I'd like to turn this into a PHP array, of just the links, while keeping the hierarchical nature of the list. Any ideas? I've looked at using htmlsimpledom and phpQuery, which both use jQuery like syntax. But, I can't seem to get the syntax right. Thanks.

Read the article
Implementing A Static NSOutlineView

- by spamguy

I'm having a hard time scraping together enough snippets of knowledge to implement an NSOutlineView with a static, never-changing structure defined in an NSArray. This link has been great, but it's not helping me grasp submenus. I'm thinking they're just nested NSArrays, but I have no clear idea. Let's say we have an NSArray inside an NSArray, defined as NSArray *subarray = [[NSArray alloc] initWithObjects:@"2.1", @"2.2", @"2.3", @"2.4", @"2.5", nil]; NSArray *ovStructure = [[NSArray alloc] initWithObjects:@"1", subarray, @"3", nil]; The text is defined in outlineView:objectValueForTableColumn:byItem:. - (id)outlineView:(NSOutlineView *)ov objectValueForTableColumn:(NSTableColumn *)tableColumn byItem:(id)ovItem { if ([[[tableColumn headerCell] stringValue] compare:@"Key"] == NSOrderedSame) { // Return the key for this item. First, get the parent array or dictionary. // If the parent is nil, then that must be root, so we'll get the root // dictionary. id parentObject = [ov parentForItem:ovItem] ? [ov parentForItem:ovItem] : ovStructure; if ([parentObject isKindOfClass:[NSArray class]]) { // Arrays don't have keys (usually), so we have to use a name // based on the index of the object. NSLog([NSString stringWithFormat:@"%@", ovItem]); //return [NSString stringWithFormat:@"Item %d", [parentObject indexOfObject:ovItem]]; return (NSString *) [ovStructure objectAtIndex:[ovStructure indexOfObject:ovItem]]; } } else { // Return the value for the key. If this is a string, just return that. if ([ovItem isKindOfClass:[NSString class]]) { return ovItem; } else if ([ovItem isKindOfClass:[NSDictionary class]]) { return [NSString stringWithFormat:@"%d items", [ovItem count]]; } else if ([ovItem isKindOfClass:[NSArray class]]) { return [NSString stringWithFormat:@"%d items", [ovItem count]]; } } return nil; } The result is '1', '(' (expandable), and '3'. NSLog shows the array starting with '(', hence the second item. Expanding it causes a crash due to going 'beyond bounds.' I tried using parentForItem: but couldn't figure out what to compare the result to. What am I missing?

Read the article
How do I scrape information off ASP.NET websites when paging and JavaScript links are being used?

- by Ian Roke

I have been given a staff list which is supposed to be up to date but it doesn't match an intranet People Finder which is written in ASP.NET. As the information is sensitive I am not able to access the database the People Finder is using so the only way I can get at the information is by scraping the structure starting at the top brass at the top and then going through each tier in turn. Each person has a Staff number which then forms the URL http://intranet/peoplefinder/index.aspx?srn=ABC1234 and then all the people who report to them are listed underneth in the format <a id="gvEmployees_ctl03_lnkFullName" href="index.aspx?srn=ABC4321" target="_self"> where each URL indicates the Staff number and provides a link to their team. The trouble arises when the teams are big as paging is implemented in the GridView with an URL such as <a href="javascript:__doPostBack('gvEmployees','Page$2')">2</a>. How would I scrape this page, capture the SRN and other details along with the people who report to the person on all pages of the GridView then loop through each reportee and do the same process until the whole list is complete?

Read the article
Help Needed With AJAX Script

- by Brian

Hello All I am working on an AJAX script but am having difficulties. First, here is the script: var xmlHttp; function GetXmlHttpObject(){ var objXMLHttp=null if (window.XMLHttpRequest){ objXMLHttp=new XMLHttpRequest() } else if (window.ActiveXObject){ objXMLHttp=new ActiveXObject("Microsoft.XMLHTTP") } return objXMLHttp } function retrieveData(){ var jBossServer = new Array(); jBossServer[0] = "d01"; jBossServer[1] = "d02"; jBossServer[2] = "p01"; jBossServer[3] = "p02"; for(var i=0; i<jBossServer.length; i++){ xmlHttp = GetXmlHttpObject(); if (xmlHttp == null){ alert ("Something weird happened ..."); return; } var url="./retrieveData.php"; url = url + "?jBossID=" + jBossServer[i]; url = url + "&sid=" + Math.random(); xmlHttp.open("GET",url,true); xmlHttp.onreadystatechange = updateMemory; xmlHttp.send(null); } } function updateMemory(){ if (xmlHttp.readyState == 4 || xmlHttp.readyState == "complete"){ aggregateArray = new Array(); aggregateArray = xmlHttp.responseText.split(','); for(var i=0; i<aggregateArray.length; i++){ alert(aggregateArray[i]); } return; } } The "retrieveData.php" page will return data that looks like this: d01,1 MB,2 MB,3 MB They relate to my 4 servers d01, d02, p01, and p02 (dev and prod). What I am doing is scraping the memory information from http://127.0.0.1:8080/web-console/ServerInfo.jsp I want to save code so I attempted to put the AJAX xmlHttp call into a loop, but I don't think it is working. All I ever get back is the information for "p02" four times. Am I able to do what I want or do I need four different functions for each server (i.e., xmlHttpServer01, xmlHttpServer02, xmlHttpServer03, xmlHttpServer04)? Thank you all for reading, have a good day. :)

Read the article
Problem pulling data from website in .NET and C#

- by Cptcecil

I have written a web scraping program to go to a list of pages and write all the html to a file. The problem is that when I pull a block of text some of the characters get written as '?'. How do I pull those characters into my text file? Here is my code: string baseUri = String.Format("http://www.rogersmushrooms.com/gallery/loadimage.asp?did={0}&blockName={1}", id.ToString(), name.Trim()); // our third request is for the actual webpage after the login. HttpWebRequest request = (HttpWebRequest)WebRequest.Create(baseUri); request.Method = "GET"; request.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)"; //get the response object, so that we may get the session cookie. HttpWebResponse response = (HttpWebResponse)request.GetResponse(); StreamReader reader = new StreamReader(response.GetResponseStream()); // and read the response string page = reader.ReadToEnd(); StreamWriter SW; string filename = string.Format("{0}.txt", id.ToString()); SW = File.AppendText("C:\\Share\\" + filename); SW.Write(page); reader.Close(); response.Close();

Read the article
Libraries and pseudocode for physical Dashboard/Status board

- by dani

OK, so I bought a 46" screen for the office yesterday, and with the imminent risk of being accused for setting up an "elaborate World Cup procrastination scheme", I'd better show my colleagues what it's meant for ;) Looking at my simple sketch, and at these great projects from which I was inspired, I would like to get some input on the following: Pseudocode for the skeleton: As some methods should be called every 24 hours ("Today's date in the heading"), others at 60 second intervals ("Twitter results"), what would be a good approach using JavaScript (jQuery) and PHP? EDIT: Alsciende: I can agree that #1 and #8 are too vague. Therefore I remove #8 and try to clarify #1: With "Pseudocode for the skeleton", I basically mean could this be done entirely using JavaScript timers and how would you set up the various timers? Library for Google Analytics: Which libraries support the Google Analytics API and can produce neat charts. Preferably HTML5, JavaScript-based like Protovis. Library for Twitter: Which libraries would you recommend for fetching twitter search results and latest tweets from profiles. Libraries for Typography/CSS/HTML5: Trying to learn some HTML5 etc. in the process, please advice on any other typography/css libraries that could be of relevance. Scraping/Parsing? I'll give you a concrete example: Trying to fetch today's menu from this restaurant's website, how would you go about? (it's in Swedish - but you get the point - sorry ;) ) Real-time stats? I'm using the WassUp-plugin for WordPress to track real-time visitors on our website. Other logging software (AWStats etc.) is probably also installed on the webserver. Any ideas on how to extract information from these and present in real-time on the dashboard? Browser choice? Which Browser and OS would you pick? Stable, Full-screen, HTML5.

Read the article
Low Level Console Input

- by Soulseekah

I'm trying to send commands to to the input of a cmd.exe application using the low level read/write console functions. I have no trouble reading the text (scraping) using the ReadConsole...() and WriteConsole() functions after having attached to the process console, but I've not figured out how to write for example "dir" and have the console interpret it as a sent command. Here's a bit of my code: CreateProcess(NULL, "cmd.exe", NULL, NULL, FALSE, CREATE_NEW_CONSOLE, NULL, NULL, &si, &pi); AttachConsole(pi.dwProcessId); strcpy(buffer, "dir"); WriteConsole(GetStdHandle(STD_INPUT_HANDLE), buffer, strlen(buffer), &charRead, NULL); STARTUPINFO attributes of the process are all set to zero, except, of course, the .cb attribute. Nothing changes on the screen, however I'm getting an Error 6: Invalid Handle returned from WriteConsole to STD_INPUT_HANDLE. If I write to (STD_OUTPUT_HANDLE) I do get my dir written on the screen, but nothing of course happens. I'm guessing SetConsoleMode() might be of help, but I've tried many mode combinations, nothing helped. I've also created a quick console application that waits for input (scanf()) and echoes back whatever goes in, didn't work. I've also tried typing into the scanf() promp and then peek into the input buffer using PeekConsoleInput(), returns 0, but the INPUT_RECORD array is empty. I'm aware that there is another way around this using WriteConsoleInput() to directly inject INPUT_RECORD structured events into the console, but this would be way too long, I'll have to send each keypress into it. I hope the question is clear. Please let me know if you need any further information. Thanks for your help.

Read the article
pyPDF - Retrieve page numbers from document

- by SquidneyPoitier

At the moment I'm looking into doing some PDF merging with pyPdf, but sometimes the inputs are not in the right order, so I'm looking into scraping each page for its page number to determine the order it should go in (e.g. if someone split up a book into 20 10-page PDFs and I want to put them back together). I have two questions - 1.) I know that sometimes the page number is stored in the document data somewhere, as I've seen PDFs that render on Adobe as something like [1243] (10 of 150), but I've read documents of this sort into pyPDF and I can't find any information indicating the page number - where is this stored? 2.) If avenue #1 isn't available, I think I could iterate through the objects on a given page to try to find a page number - likely it would be its own object that has a single number in it. However, I can't seem to find any clear way to determine the contents of objects. If I run: pdf.getPage(0).getContents() This usually either returns: {'/Filter': '/FlateDecode'} or it returns a list of IndirectObject(num, num) objects. I don't really know what to do with either of these and there's no real documentation on it as far as I can tell. Is anyone familiar with this kind of thing that could point me in the right direction?

Read the article
Gettingn Started with Facebook API

- by Btibert3

I have a friend that owns a small business and has a Page on Facebook. I want to help her manage it from a marketing perspective, and figure that it may be best to do so through their API. I have skimmed their API documentation, and have a basic working knowledge of Python. What I can't figure out is if I can access their page's data with Python and grab the data on wall posts, who liked posts, etc. Is this possible? I can't find a decent tutorial for someone who is new to programming. To provide context, I have been scraping the Twitter Search API for some time now and I am hoping there is something similar (request certain data elements, and have it returned as structured data I can analyze). I find their API extremely straight forward, and for Facebook, I don't know where to begin. I don't want to create an application, I simply want to access the data that is related to my friend's page. I am hoping to find some decent tutorials and help on what I will need to get started. Any help you can provide will be greatly appreciated.

Read the article

Search Results

Search found 346 results on 14 pages for 'scraping'.

Page 12/14 | < Previous Page | 8 9 10 11 12 13 14 | Next Page >

- by TenJack

- by absk

- by Zachary Burt

- by fwfwfw

- by Jack

- by that_guy

- by Steve

- by Dave Campbell

- by Kevin Raffay

- by Jim Jim

- by Dan

- by pibyers

- by Choco

- by Twirrim

- by Twirrim

- by Shaddi

- by Klark

- by spamguy

- by Ian Roke

- by Brian

- by Cptcecil

- by dani

- by Soulseekah

- by SquidneyPoitier

- by Btibert3

< Previous Page | 8 9 10 11 12 13 14 | Next Page >