Search Results

Search found 43006 results on 1721 pages for 'web scraping'.

Page 127/1721 | < Previous Page | 123 124 125 126 127 128 129 130 131 132 133 134 | Next Page >

Get Mechanize to handle cookies from an arbitrary POST (to log into a website programmatically)

- by Horace Loeb

I want to log into https://www.t-mobile.com/ programmatically. My first idea was to use Mechanize to submit the login form: However, it turns out that this isn't even a real form. Instead, when you click "Log in" some javascript grabs the values of the fields, creates a new form dynamically, and submits it. "Log in" button HTML: <button onclick="handleLogin(); return false;" class="btnBlue" id="myTMobile-login"><span>Log in</span></button> The handleLogin() function: function handleLogin() { if (ValidateMsisdnPassword()) { // client-side form validation logic var a = document.createElement("FORM"); a.name = "form1"; a.method = "POST"; a.action = mytmoUrl; // defined elsewhere as https://my.t-mobile.com/Login/LoginController.aspx var c = document.createElement("INPUT"); c.type = "HIDDEN"; c.value = document.getElementById("myTMobile-phone").value; // the value of the phone number input field c.name = "txtMSISDN"; a.appendChild(c); var b = document.createElement("INPUT"); b.type = "HIDDEN"; b.value = document.getElementById("myTMobile-password").value; // the value of the password input field b.name = "txtPassword"; a.appendChild(b); document.body.appendChild(a); a.submit(); return true } else { return false } } I could simulate this form submission by POSTing the form data to https://my.t-mobile.com/Login/LoginController.aspx with Net::HTTP#post_form, but I don't know how to get the resultant cookie into Mechanize so I can continue to scrape the UI available when I'm logged in. Any ideas?

Read the article
How to submit a form automatically using HttpWebResponse

I am looking for an application that can do the following a) Programmatically auto login to a page(login.asxp) using HttpWebResponse by using already specified username and password. b) Detect the redirect URL if the login is successful. c) Submit another form (settings.aspx) to update certain fields in the database. The required coding needs to be using asp.net The application needs to complete this entire process in the same session cookie.

Read the article
Nokogiri and Special Characters

- by Moe

I'm using Nokogiri to grab the contents of the title tag on a webpage, but am having trouble with accented characters. What's the best way to deal with these? Here's what I'm doing: require 'open-uri' require 'nokogiri' doc = Nokogiri::HTML(open(link)) title = doc.at_css("title") At this point, the title looks like this: Rag\303\271 Instead of: Ragù How can I have nokogiri return the proper character (e.g. ù in this case)?

Read the article
Why does Opera parse my web page as XML?

- by Adrian Grigore

I just tried viewing my website http://www.logmytime.de/ in Opera (version 10.50) for the first time and for some reason it gives me an "xml parsing failed error" and refuses to display the web page. I can choose to "Reparse the document as HTML" and then the page works fine, but that's hardly a solution to my problem. The weird thing is that the error still occurs after setting a HTML (instead of XTHML) doctype: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> I checked the source output from the browser to make sure I did not make any mistake. I even viewed the same web page in Firebug and it shows a Content-Type of text/html; . So, why does Opera still try to parse my web page as XML? Thanks, Adrian

Read the article
Capture ASP output for monitoring

- by scourge.zero

How do I Capture ASP.NET output and then store it as temp memory so that I can use them in an application to do comparison. example. there's this site which has ASP output. Sorry I do not have server access, what I can do is view the output. The site by the way is a monitor for all users logged in and in which ever channel. output e.g. Channel 1 Username logged in (0 / 1) Username 1 1 John Smith 1 George B 0 Channel 2 Username logged in (0 / 1) Username 1 1 John Smith 0 George B 0 what I wanted to do is to capture this output and then show them this way. Username Channel 1 Channel 2 Total Username 1 1 1 2 John Smith 1 0 1 George B 0 0 0 I dont knw where to start.

Read the article
How can I access an ASP.Net 2.0 web service using VB Script?

- by Steve Hiner

I'm trying to find a way to access a web service from a VB Script .vbs file running under wscript.exe. I pulled some sample code from Microsoft but it gives me an error. Dim SOAPClient, Response Set SOAPClient = createobject("MSSOAP.SOAPClient") SOAPClient.mssoapinit("https://www.domain.com/Folder/Service.asmx?WSDL") On that last line I get an error message: WSDLReader: No valid schema specification was found. This version of the SOAP Toolkit only supports 1999 and 2000 XSD schema specifications After getting that message I installed the SOAP 3.0 SDK to make sure I have the most recent version (since it's now deprecated for .Net) but I still get the same error. The reason it needs to be in VB Script is because it's going to be used in a program over which I have no control and it only supports VB Script. Is there a way to get VB Script to be able to parse a newer WSDL file? I do have the source code for the web service. Is there something I can change in the web service to make it schema compatible with the SOAP toolkit?

Read the article
Get Mechanize to handle cookies from an arbitrary POST (to log into https://www.t-mobile.com/ progra

- by Horace Loeb

I want to log into https://www.t-mobile.com/ programmatically. My first idea was to use Mechanize to submit the login form: However, it turns out that this isn't even a real form. Instead, when you click "Log in" some javascript grabs the values of the fields, creates a new form dynamically, and submits it. "Log in" button HTML: <button onclick="handleLogin(); return false;" class="btnBlue" id="myTMobile-login"><span>Log in</span></button> The handleLogin() function: function handleLogin() { if (ValidateMsisdnPassword()) { // client-side form validation logic var a = document.createElement("FORM"); a.name = "form1"; a.method = "POST"; a.action = mytmoUrl; // defined elsewhere as https://my.t-mobile.com/Login/LoginController.aspx var c = document.createElement("INPUT"); c.type = "HIDDEN"; c.value = document.getElementById("myTMobile-phone").value; // the value of the phone number input field c.name = "txtMSISDN"; a.appendChild(c); var b = document.createElement("INPUT"); b.type = "HIDDEN"; b.value = document.getElementById("myTMobile-password").value; // the value of the password input field b.name = "txtPassword"; a.appendChild(b); document.body.appendChild(a); a.submit(); return true } else { return false } } I could simulate this form submission by POSTing the form data to https://my.t-mobile.com/Login/LoginController.aspx with Net::HTTP#post_form, but I don't know how to get the resultant cookie into Mechanize so I can continue to scrape the UI available when I'm logged in. Any ideas?

Read the article
Extracting Window Contents

- by user293392

I need to extract window content if this is based on text, or at least the file path associated to that window. To-date, I have considered: 1. win32api 2. 3rd party libraries 3. wrapper classes However, I am not satisfied with the solutions. So any ideas how this can be done in a clean way?

Read the article
Is there something in C# that lets me keep a .XML file (with its content and everything) in memory,

- by Sergio Tapia

I'm going to be doing some webscraping and my plan is to have something like this: public class Searcher { public void Search(string searchTerm) { } private void Search(string term) { //Some HTMLAgilityPack Voodoo here } private void SaveResults() { //Actually save the results as .XML file. } } Thanks!

Read the article
Howto: Download local copy Google's Pacman game

- by macek

It looks like this is HTML+JavaScript. Is there a way I can download a copy so I can continue playing after they take it down? Thanks for any help :)

Read the article
Why is python decode replacing more than the invalid bytes from an encoded string?

- by dangra

Trying to decode an invalid encoded utf-8 html page gives different results in python, firefox and chrome. The invalid encoded fragment from test page looks like 'PREFIX\xe3\xabSUFFIX' >>> fragment = 'PREFIX\xe3\xabSUFFIX' >>> fragment.decode('utf-8', 'strict') ... UnicodeDecodeError: 'utf8' codec can't decode bytes in position 6-8: invalid data What follows is the summary of replacement policies used to handle decoding errors by python, firefox and chrome. Note how the three differs, and specially how python builtin removes the valid S (plus the invalid sequence of bytes). by Python The builtin replace error handler replaces the invalid \xe3\xab plus the S from SUFFIX by U+FFFD >>> fragment.decode('utf-8', 'replace') u'PREFIX\ufffdUFFIX' >>> print _ PREFIX?UFFIX The python implementation builtin replace error handler looks like: >>> python_replace = lambda exc: (u'\ufffd', exc.end) As expected, trying this gives same result than builtin: >>> codecs.register_error('python_replace', python_replace) >>> fragment.decode('utf-8', 'python_replace') u'PREFIX\ufffdUFFIX' >>> print _ PREFIX?UFFIX by Firefox Firefox replaces each invalid byte by U+FFFD >>> firefox_replace = lambda exc: (u'\ufffd', exc.start+1) >>> codecs.register_error('firefox_replace', firefox_replace) >>> test_string.decode('utf-8', 'firefox_replace') u'PREFIX\ufffd\ufffdSUFFIX' >>> print _ PREFIX??SUFFIX by Chrome Chrome replaces each invalid sequence of bytes by U+FFFD >>> chrome_replace = lambda exc: (u'\ufffd', exc.end-1) >>> codecs.register_error('chrome_replace', chrome_replace) >>> fragment.decode('utf-8', 'chrome_replace') u'PREFIX\ufffdSUFFIX' >>> print _ PREFIX?SUFFIX The main question is why builtin replace error handler for str.decode is removing the S from SUFFIX. Also, is there any unicode's official recommended way for handling decoding replacements?

Read the article
Copying data across tabs

- by Guillermo

Hello, I got two different forms in two different tabs. One has data from our system and the other one is an interface of another, external, system in wich we need to copy data into (XML or API integration not an option here) The this is that, having open both forms - in two different tabs - i need a greasemonkey script or something similar that allows my to copy data from one form to the other (using the getValue method in Javascript). The problem right now is that I cannot figure out how to reference with a greasemonkey script one particular tab or window (to rad data from or write data to). Do you think it would be possible to do what I'm thinking to do? THANKS

Read the article
Is selling a "website screen scraper" is illegal?

- by Yatendra Goel

I have coded a "website screen scraper" and want to sell it commercially. I know that webpages scraped by the screen scraper are restricted to be scraped by the webmaser of that website. The robots.txt file of the website says that its webpages must not be scraped. So my question is whether selling that screen scraper is a crime or using that screen scraper is a crime in legal terms. I know that this question is related to law but I thought the software experts on SO must also have answer to this question.

Read the article
Are there any free .NET OCR libraries that will perform OCR on an application window directly?

- by Kelsey

I am looking for a free .NET OCR library that will be able to do OCR on a given application window or even a image in memory (I can take a snapshot of the application window myself). I have looked at tessnet2 and MODI but both require an image located on disk. I need to use OCR because the application I am trying to write a script for does some wacky stuff that cannot be read using windows API and I need to scrape data from the screen. I have tested both of tessnet2 and MODI and they both can read the text mostly but because this has to run in an enviroment that will not be able to write to disk, I need it to be able to read directly from the applciation window or some type of memory stream. I am thinking OCR is my only soution but there could be other methods that I am not thinking of. Suggestions?

Read the article
Is it a good idea to cache data from web services into a database?

- by Thierry Lam

Let's assume that Stackoverflow offers web services where you can retrieve all the questions asked by a specific user. A request to get all question from user A can result in the following json output: { { "question": "What is rest?", "date_created": "20/02/2010", "votes": 1, }, { "question": "Which database to use for ...", "date_created": "20/07/2009", "votes": 5, }, } If I want to manipulate and present the data in any ways that I want, will it be wise to dump it in a local database? At some point, I will also want to retrieve all answers for each question and store them in a local database. The workflow that I'm thinking is: User logs in. Web services retrieve all questions asked by the logged in user, dump them in a local database. User wants all answers for a specific question, another web service does the retrieval and dump them in a local database. After user logs out, delete from the local database all questions and answers from that user.

Read the article
Nokogiri Doc Element Not Returning Correctly

- by TenJack

I am trying to scrape a wiktionary entry: uri = URI.parse("http://en.wiktionary.org/wiki/" + CGI.escape('abjure')) doc = Nokogiri::HTML(open(uri, 'User-Agent' => 'ruby')) but the doc shows no elements for this word. The other words work fine and this word used to work. I have no idea what changed. Anyone see anything wrong with this?

Read the article
How would you protect a database of links from being scraped?

- by Yegor

I have a large database of links, which are all sorted in specific ways and are attached to other information, which is valuable (to some people). Currently my setup (which seems to work) simply calls a php file like link.php?id=123, it logs the request with a timestamp into the DB. Before it spits out the link, it checks how many requests were made from that IP in the last 5 minutes. If its greater than x, it redirects you to a captcha page. That all works fine and dandy, but the site has been getting really popular (as well as been getting DDOsed for about 6 weeks), so php has been getting floored, so Im trying to minimize the times I have to hit up php to do something. I wanted to show links in plain text instead of thru link.php?id= and have an onclick function to simply add 1 to the view count. Im still hitting up php, but at least if it lags, it does so in the background, and the user can see the link they requested right away. Problem is, that makes the site REALLY scrapable. Is there anything I can do to prevent this, but still not rely on php to do the check before spitting out the link?

Read the article
Calling UIGetScreenImage() on manually-spawned thread prints "_NSAutoreleaseNoPool():" message to lo

- by jtrim

This is the body of the selector that is specified in NSThread +detachNewThreadSelector:(SEL)aSelector toTarget:(id)aTarget withObject:(id)anArgument NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; while (doIt) { if (doItForSure) { NSLog(@"checking"); doItForSure = NO; (void)gettimeofday(&start, NULL); /* do some stuff */ // the next line prints "_NSAutoreleaseNoPool():" message to the log CGImageRef screenImage = UIGetScreenImage(); /* do some other stuff */ (void)gettimeofday(&end, NULL); elapsed = ((double)(end.tv_sec) + (double)(end.tv_usec) / 1000000) - ((double)(start.tv_sec) + (double)(start.tv_usec) / 1000000); NSLog(@"Time elapsed: %e", elapsed); [pool drain]; } } [pool release]; Even with the autorelease pool present, I get this printed to the log when I call UIGetScreenImage(): 2010-05-03 11:39:04.588 ProjectName[763:5903] *** _NSAutoreleaseNoPool(): Object 0x15a2e0 of class NSCFNumber autoreleased with no pool in place - just leaking Has anyone else seen this with UIGetScreenImage() on a separate thread?

Read the article
how to manually deploy a web service on tomcat 6?

- by opensas

I'm learning how to develop soap web services with java. So far now I've been following this excelente tutorial http://java.sun.com/developer/technicalArticles/J2SE/jax_ws_2/ it all goes well, I have my web service working from the command line with it's embeded server, and then, with the help of netbeans I deployed it on tomcat. I'd like to know the steps to manually deploy it on tomcat, in order to learn how it's done, and because I don't like depending on an ide. I mean, I'd like to know how everything could be done from the command line and a text editor. I've also found this link that explains how to manually deploy a servlet to tomcat, http://linux-sxs.org/internet_serving/c292.html but I couldn't find any article telling how to deploy a web service... thanks a lot saludos sas

Read the article
How to export scrubyt extractor?

- by robintw

I've written a scrubyt extractor based on the 'learning' technique - that is, specifying the current text on the page and getting it to work out the XPath expressions itself. However, I now want to export the extractor so that it can be used even when the page has changed. The documentation for scrubyt seems to be all over the place now, but from what I can find I should be able to put the line extractor.export(__FILE__) and it should work. It doesn't - I just get an error saying that there is the wrong number of arguments for export, it should have 0. I've tried it without any arguments and it still fails. I would ask on the scrubyt forum, but it seems like no-one's been there for ages! Any ideas what to do here?

Read the article
I want to scrape a site using GAE and post the results into a Google Entity

- by cozza

I want to scrape this URL : https://www.xstreetsl.com/modules.php?searchSubmitImage_x=0&searchSubmitImage_y=0&SearchLocale=0&name=Marketplace&SearchKeyword=business&searchSubmitImage.x=0&searchSubmitImage.y=0&SearchLocale=0&SearchPriceMin=&SearchPriceMax=&SearchRatingMin=&SearchRatingMax=&sort=&dir=asc Go into each of the links and extract out various pieces of information e.g. permissions, prims etc then post the results into a Entity on google app engine. I want to know the best way to go about it? Chris

Read the article
Saving HttpResponse/Request to file system

- by chrisjlong

Here is my scenario. User fills out this large page which is dynamically created based off DB values. Those values can change. When the user fills out the page and hits submit we want to save a copy of the page as html on the server, this way if the text or wording changes, when they go back to view their posted information, it is historically accurate. So I basically need to do this protected void buttonSave_Click(object sender, EventArgs e) { //collect information into an object to save it in the db bool result = BusinessLogic.Save(myBusinessObject); if (result) //!!! Here is where I need to save this page as an html file on my servers IFS!!!! else //whatever Response.Redirect("~/SomeOtherPage.aspx"); } Any help is greatly apprciated. Also I CANNOT just request the data from the url because query string parameters are a big no no in this case. The key to pull the database info up (at its highest level) is all in session so I cant just request a url and save it. Thanks!

Read the article
How to open URL's in rails?

- by yuval

I'm trying to read in the html of a certain website. Trying @something = open("http://www.google.com/") fails with the following error: Errno::ENOENT in testController#show No such file or directory - http://www.google.com/ Going to http://www.google.com/, I obviously see the site. What am I doing wrong? Thanks!

Read the article
What movie website allows people to scrape it?

- by Sergio Tapia

I've wanted to make a C# library to scrape movie information and return it to the application, but someone told me that it's against the TOS. RottenTomatoes seems to have no problems with it from what I've read on their licensing page, but I'm not quite sure. Where could I aquire movie information legally and without cost? It's for an open source application hosted here: LINK

Read the article
Help converting code using httlib2 to use urllib2

- by ThinkCode

What am I trying to do? Visit a site, retrieve cookie, visit the next page by sending in the cookie info. It all works but httplib2 is giving me one too many problems with socks proxy on one site. http = httplib2.Http() main_url = 'http://mywebsite.com/get.aspx?id='+ id +'&rows=25' response, content = http.request(main_url, 'GET', headers=headers) main_cookie = response['set-cookie'] referer = 'http://google.com' headers = {'Content-type': 'application/x-www-form-urlencoded', 'Cookie': main_cookie, 'User-Agent' : USER_AGENT, 'Referer' : referer} How to do the same exact thing using urllib2 (cookie retrieving, passing to the next page on the same site)? Thank you.

Read the article

< Previous Page | 123 124 125 126 127 128 129 130 131 132 133 134 | Next Page >