Search Results

Search found 43006 results on 1721 pages for 'web scraping'.

Page 128/1721 | < Previous Page | 124 125 126 127 128 129 130 131 132 133 134 135 | Next Page >

Help converting code using httlib2 to use urllib2

- by ThinkCode

What am I trying to do? Visit a site, retrieve cookie, visit the next page by sending in the cookie info. It all works but httplib2 is giving me one too many problems with socks proxy on one site. http = httplib2.Http() main_url = 'http://mywebsite.com/get.aspx?id='+ id +'&rows=25' response, content = http.request(main_url, 'GET', headers=headers) main_cookie = response['set-cookie'] referer = 'http://google.com' headers = {'Content-type': 'application/x-www-form-urlencoded', 'Cookie': main_cookie, 'User-Agent' : USER_AGENT, 'Referer' : referer} How to do the same exact thing using urllib2 (cookie retrieving, passing to the next page on the same site)? Thank you.

Read the article
How to mock test a web service in PHPUnit across multiple tests?

- by scraton

I am attempting to test a web service interface class using PHPUnit. Basically, this class makes calls to a SoapClient object. I am attempting to test this class in PHPUnit using the "getMockFromWsdl" method described here: http://www.phpunit.de/manual/current/en/test-doubles.html#test-doubles.stubbing-and-mocking-web-services However, since I want to test multiple methods from this same class, every time I setup the object, I also have to setup the mock WSDL SoapClient object. This is causing a fatal error to be thrown: Fatal error: Cannot redeclare class xxxx in C:\web\php5\PEAR\PHPUnit\Framework\TestCase.php(1227) : eval()'d code on line 15 How can I use the same mock object across multiple tests without having to regenerate it off the WSDL each time? That seems to be the problem.

Read the article
Why Shouldn't I Programmatically Submit Username/Password to Facebook/Twitter/Amazon/etc?

- by viatropos

I wish there was a central, fully customizable, open source, universal login system that allowed you to login and manage all of your online accounts (maybe there is?)... I just found RPXNow today after starting to build a Sinatra app to login to Google, Facebook, Twitter, Amazon, OpenID, and EventBrite, and it looks like it might save some time. But I keep wondering, not being an authentication guru, why couldn't I just have a sleek login page saying "Enter username and password, and check your login service", and then in the background either scrape the login page from say EventBrite and programmatically submit the form with Mechanize, or use an API if there was one? It would be so much cleaner and such a better user experience if they didn't have to go through popups and redirects and they could use any previously existing accounts. My question is: What are the reasons why I shouldn't do something like that? I don't know much about the serious details of cookies/sessions/security, so if you could be descriptive or point me to some helpful links that would be awesome. Thanks!

Read the article
The type or namespace name 'Oledb' does not exist in the namespace 'System.Data' error on Web Servic

- by Pankaj Kumar

Hi everyone... i have a webservice that i want to test by typing the url in the address bar in the web browser localhost:1981/myProject/admin/autocomplete.asmx and when i do this it gives this compilation error CS0234: The type or namespace name 'Oledb' does not exist in the namespace 'System.Data' (are you missing an assembly reference?) i know this is because we added this in our web.config <add namespace="System.Data.Oledb"/> <add namespace ="System.Data"/> in the namespaces section..... when i call this web service through ajax it works but if i try to test it it gives this error. Is there any way to prevent this?

Read the article
form submitting with mechanize and Python

- by MATELIN Alexis

I'm trying to scrap a website that requires to submit two forms : a first one to loggin and a second one to specify my research. I'm using Python and the mechanize package. No problem with the first one, but i just can't figure out how to pass through the second one. Here is the part of my code related to the firm above-mentionned agemin=18 agemax=25 by='region' country='France' region=2 newcustomers=1 browser.select_form(nr=0) browser['age[min]']=agemin browser['age[max]']=agemax browser['country']=country browser['region']=region browser['by']=by browser['new-customers']=newcustomers response=browser.submit() content=response.read() but when I submit the variable 'age[min]' by example, I get the following error message : TypeError: object of type 'int' has no len() to give you some more informations, here is what I get with 'print br.form' <POST http://www.adopteunmec.com/qsearch/ajax_quick application/x-www-form-urlencoded <SelectControl(age[min]=[, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, *30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])> <SelectControl(age[max]=[, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, *45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])> <SelectControl(by=[*region, distance])> <SelectControl(country=[*fr, be, ch, ca])> <SelectControl(region=[*1, 2, 3, 4, 5, 6, 7, 8, 22, 23, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 11])> <SelectControl(distance[min]=[*, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000])> <SelectControl(distance[max]=[, 0, 10, 20, 30, 40, 50, 60, 70, *80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000])> <CheckboxControl(new=[*1])>> My guess is that the form needs an object (like a list) containing all the variables to accept it ; that's why it refuses the variables submited one by one. Thank you in advance for any help ! Alexis

Read the article
Reading Ontology with Jena, feeding it with RDF triples, and producing correct RDF string output.

- by JonB

Hi, I have an ontology, which I read in with Jena to help me scrape some RDFa triples from a website. I don't currently store these triples in a Jena model, but that is fairly straight forward to do, its on my to do next list. The area I am struggling with, though, is to get Jena to output correct RDF for the ontology I have. The ontology uses Owl and RDFS definitions, but when I pass some example triples into the model, they don't appear correctly. Almost as if it doesn't know anything about the ontology. The output is, however, still valid RDF, just it's not coming out in the form I was hoping for. Am I correct in thinking that Jena should be able to produce well written RDF (not just valid) about the triples I have collected, based on the ontology or does this out stretch what it is capable of? Many thanks for any input.

Read the article
How can I get all content within <td> tag using a HTML Agility Pack?

- by Bob Dylan

So I'm writing an application that will do a little screen scrapping. I'm using the HTML Agility Pack to load an entire HTML page into an instance of HtmlDocoument called doc. Now I want to parse that doc, looking for this: <table border="0" cellspacing="3"> <tr><td>First rows stuff</td></tr> <tr> <td> The data I want is in here <br /> and it's seperated by these annoying <br /> 's. No id's, classes, or even a single <p> tag. </p> Just a bunch of <br /> tags. </td> </tr> </table> So I just need to get the data within the 2nd row. How can I do this? Should I use a regex or something else?

Read the article
Python module for converting PDF to text

- by cnu

Is there any python module to convert PDF files into text? I tried one piece of code found in Activestate which uses pypdf but the text generated had no space between and was of no use.

Read the article
Is selling a "website screen scraper" illegal?

- by Yatendra Goel

I have coded a "website screen scraper" and want to sell it commercially. I know that webpages scraped by the screen scraper are restricted to be scraped by the webmaser of that website. The robots.txt file of the website says that its webpages must not be scraped. So my question is whether selling that screen scraper is a crime or using that screen scraper is a crime in legal terms. I know that this question is related to law but I thought the software experts on SO must also have answer to this question.

Read the article
Is there a sample set of web log data available for testing analysis against?

- by Peter

Sorry if this isn't strictly speaking a programming question, but I figure my best chance of success would be to ask here. I'm developing some web log file analysis algorithms, but to date I only have access to a fairly small amount of web log data to process. One algorithm I want to use makes some assumptions about 'the shape' of typical web log data, and so I'd like to test it against a larger 'exemplar' - perhaps the logs of a busy site with a good distribution of traffic from different sources etc. Is there a set of such data available somewhere? Thanks for any help.

Read the article
How can I download information from a website if it returns XML/JSON in its response?

- by Sergio Tapia

Does Python3 have a built in method to do this? Any guidance at all would be great! :) The website in question exposes all of its information and even gives you an API key to use.

Read the article
How can I isolate the form controls in a ASP Web User Control from the rest of the page's form contr

- by Justin808

I have a Web User Control I created for authentication. The web user control is inside the box below. Clicking any button (1 or 2) below works correct as it goes to the correct c# button click event in the code behind file. If I press enter on fields a or b it goes to the correct callback (button1's) if I press enter on field c it still goes to button1's callback, not button2's How can I give my web user control a nice self contained for and view state etc, so it wont mess with the remainder of the page's form? +--------------+ | User: __a___ | | Pass: __b___ | | [button1]| +--------------+ Prompt:______c______ [button2]

Read the article
Java website on Tomcat PHP website on Apache - how to get PHP web pages into Java web pages?

- by Venkat

We have a Java web application deployed on Tomcat. We also setup Apache and mod_proxy_ajp to route web requests (port 80/443) to Tomcat. We would like to deploy a PHP application on the same Apache server - probably under a subdirectory (/var/www/ourapp). Now we would like to access & display web pages from PHP application within web pages generated by Java application. Planning to implement Single Sign-on as well. Example: Web page from java has (JQuery Tabs) and we like to display the PHP web page within a tab while all other HTML comes from java application. Can you please give a overall picture of how to proceed about this? Mainly 1. how we should install/setup our PHP application on same Apache server which is used to route web requests to Tomcat? i.e. either setup sub domain or install in sub directory 2. How to bring PHP pages into present web pages (generated by java). Can we use AJAX requests or should go for Java PHP Bridge/ Querces such applications? Thank you for your time in advance. Regards.

Read the article
CasperJS Load next page in loop

- by SquiresSquire

I've been working on a script which collates the scores for a list of user from a website. One problem is though, I'm trying to load the next page in the while loop, but the function is not being loaded... this.thenOpen("http://www.url.com?ul=" + currentName + "&sortdir=desc&sort=lastfound", function (id) { return function () { this.capture("Screenshots/" + json.username[id] + ".png"); if (!casper.exists(x("//*[contains(text(), 'That username does not exist in the system')]"))) { if (casper.exists(x('//*[@id="ctl00_ContentBody_ResultsPanel"]/table[2]'))){ this.thenEvaluate(tgsagc.tagNextLink); tgsagc.cacheCount = 0; tgsagc.continue = true; this.echo("------------ " + json.username[id] + " ------------"); while (tgsagc.continue) { this.then(function(){ this.evaluate(tgsagc.tagNextLink); var findDates, pageNumber; pageNumber = this.evaluate(tgsagc.pageNumber); findDates = this.evaluate(tgsagc.getFindDates); this.echo("Found " + findDates.length + " on page " + pageNumber); tgsagc.checkFinds(findDates); this.echo(tgsagc.cacheCount + " Caches for " + json.username[id]); this.echo("Continue? " + tgsagc["continue"]); return this.click("#tgsagc-link-next"); }); } leaderboard[json.username[id]] = tgsagc.cacheCount; console.log("Final Count: " + leaderboard[json.username[id]]); console.log(JSON.stringify(leaderboard)); } else { this.echo("------------ " + json.username[id] + " ------------"); this.echo("0 Caches Found"); leaderboard[json.username[id]] = 0; console.log(JSON.stringify(leaderboard)); } } else { this.echo("------------ " + json.username[id] + " ------------"); this.echo("No User found with that Username"); leaderboard[json.username[id]] = null; console.log(JSON.stringify(leaderboard)); }

Read the article
What ASP.NET Web Config entries could limit certain file access by date and time?

- by Dr. Zim

What entries in a web.config could allow certain files to become publicly accessible after a certain date and time? Specifically, we have these files starting with AB_.jpg where the _ could be anything. We put them in a folder on April 27th for example, but they shouldn't be accessible until April 30th at 11:59:59 PM. I think the web.config in part works like Unix's FTP .htaccess file to define file security. For example, this web.config entry allows directory browsing: <?xml version="1.0" encoding="UTF-8"?> <configuration> <system.webServer> <directoryBrowse enabled="true" /> </system.webServer> </configuration>

Read the article
How can I get all content within <table></table> tags using a regex?

- by Bob Dylan

So I'm writing an application that will do a little screen scrapping. All the pages (about 1000 or so) contain this line: <table border="0" cellspacing="3"> <tr><td>First rows stuff</td></tr> <tr> <td> The data I want is in here <br /> and it's seperated by these annoying <br /> 's. No id's, classes, or even a single <p> tag. Just a bunch of <br /> tags. </td> </tr> </table> So I just need to get the data within the 2nd row out. How can I do this? Should I use a regex or something else?

Read the article
C# WebClient - View source question

- by Jim

I'm using a C# WebClient to post login details to a page and read the all the results. The page I am trying to load includes flash (which, in the browser, translates into HTML). I'm guessing it's flash to avoid being picked up by search engines??? The flash I am interested in is just text (not an image/video) etc and when I "View Selection Source" in firefox I do actually see the text, within HTML, that I want to see. (Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?) Currently after I have posted my login details, and loaded the HTML back, I see the page which does NOT show the flash HTML (as if I had viewed source for the whole page). Thanks in advance, Jim PS: I should point out that the POST is actually working, my log in is successful.

Read the article
How do I send an arrow key in Perl using the Net::Telnet module?

- by pokstad

Using the Perl module Net::Telnet, how do you send an arrow key to a telnet session so that it would be the same thing as a user pressing the down key on the keyboard? use Net::Telnet; my $t = new Net::Telnet(); my $down_key=?; #How do you send a down key in a telnet session? t->print($down_key);

Read the article
Problems and solution for Developing a connected web and desktop application?

- by Taz

hi, I am trying to develop a web application(Using ASP.NET and c#) that uses a specific database hosted on web server. I will have another desktop application that will use a local database. Both databases have same structure and data at start up. Then databases will change when users add data to web application and an employee adds data to the desktop application. After a while I have to sync both databases. What will be best way to do this? Is there any opensource example/ starter kit to start with? Thanks.

Read the article
How to handle redirects while parsing HTML? - Python

- by RadiantHex

Hi folks, I'm trying to submit a few forms through a Python script, I'm using the mechanized library. This is so I can implement a temporary API. The problem is that before after submission a blank page is returned informing that the request is being processed, after a few seconds the page is redirected to the final page. I understand if it might sound a bit generic, but I'm not sure what is going on. :) Any ideas?

Read the article
Webservice and ORM Framework?

- by Sebastian

Does anybody know a good web framework that includes an ORM mapper and allows straight forward implementation of web services? I'm looking for a framework written in PHP or C++. I'm looking for the following features (not all of them required, some will do nicely) data definition in one place used by database and web service WSDL generation XML output/JSON output boilerplate code generation So what I would like is a framework that let's me specify the objects, the web service functions on those objects and then generate everything that is required leaving me to fill the business logic (connecting the database to the web service). Anything like that out there? Background information for why I need this: I'm looking into creating a web project: the client is a rich web application that fetches all its data using AJAX. It will be completely custom made using only a low level javascript library. The server back end is supposed to serve static content and javascript (basically the rich web application) and to provide a RESTful web service API (which I would like to implement using aforementioned framework).

Read the article
Nokogiri find only inbound links

- by astropanic

I have an html document located on http://somedomain.com/somedir/example.html The document contains of four links: http://otherdomain.com/other.html http://somedomain.com/other.html /only.html test.html How I can get the full urls for the links in the current domain ? I mean I should get: http://somedomain.com/other.html http://somedomain.com/only.html http://somedomain.com/somedir/test.html The first link should be ignored because it does'nt match my domain

Read the article
Programmatically login to a website and redirect the user to the logged in page?

- by Santhosh

Hi, Right now, I have all the employees of my company login to an external website using the company id, username and a password. We are trying to integrate it into an intranet portal which should provide seamless access to this website without requiring the user to enter these credentials. Is there any way of doing this programmatically (.NET C#)? Very similar to screenscraping, Can I simulate the appropriate POST action and then redirect the user to the logged in page? Any help is appreciated. Thanks.

Read the article
How selectorgadget works?

- by andrisetiawan

How selectorgadget.com works? Is there any link/page that explain the algorithm behind selectorgadget? thanks

Read the article
How can I screen scrape with Perl?

- by Sakthivel

I need to display some values that are stored in a website, for that I need to scrape the website and fetch the content from the table. Any ideas?

Read the article

< Previous Page | 124 125 126 127 128 129 130 131 132 133 134 135 | Next Page >