How do I scrape information off ASP.NET websites when paging and JavaScript links are being used?

Posted by Ian Roke on Stack Overflow See other posts from Stack Overflow or by Ian Roke
Published on 2010-03-15T18:01:51Z Indexed on 2010/03/16 14:26 UTC
Read the original article Hit count: 435

Filed under:

ASP.NET

|

c#

|

vb.net

|

scrape

|

gridview

I have been given a staff list which is supposed to be up to date but it doesn't match an intranet People Finder which is written in ASP.NET.

As the information is sensitive I am not able to access the database the People Finder is using so the only way I can get at the information is by scraping the structure starting at the top brass at the top and then going through each tier in turn.

Each person has a Staff number which then forms the URL http://intranet/peoplefinder/index.aspx?srn=ABC1234 and then all the people who report to them are listed underneth in the format <a id="gvEmployees_ctl03_lnkFullName" href="index.aspx?srn=ABC4321" target="_self"> where each URL indicates the Staff number and provides a link to their team.

The trouble arises when the teams are big as paging is implemented in the GridView with an URL such as <a href="javascript:__doPostBack('gvEmployees','Page$2')">2</a>.

How would I scrape this page, capture the SRN and other details along with the people who report to the person on all pages of the GridView then loop through each reportee and do the same process until the whole list is complete?

© Stack Overflow or respective owner

Related posts about ASP.NET

Migrating ASP.NET MVC 1.0 applications to ASP.NET MVC 2 RTM

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
Note: ASP.NET MVC 2 RTM isn’t yet released! But this tool will help you get your ASP.NET MVC 1.0 applications ready for when it is! I have updated the MVC App Converter to convert projects from ASP.NET MVC 1.0 to ASP.NET MVC 2 RTM. This should be last the last major change to the MVC App Converter… >>> More
April 14th Links: ASP.NET, ASP.NET MVC, ASP.NET Web API and Visual Studio

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
Here is the latest in my link-listing blog series: ASP.NET Easily overlooked features in VS 11 Express for Web: Good post by Scott Hanselman that highlights a bunch of easily overlooked improvements that are coming to VS 11 (and specifically the free express editions) for web development: unit… >>> More
Use ASP.NET 4 Browser Definitions with ASP.NET 3.5

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
We updated the browser definitions files included with ASP.NET 4 to include information on recent browsers and devices such as Google Chrome and the iPhone. You can use these browser definition files with earlier versions of ASP.NET such as ASP.NET 3 Read More......(read more) >>> More
ASP.NET webforms + ASP.NET Ajax versus ASP.NET MVC and Ajax framework freedom

as seen on Stack Overflow - Search for 'Stack Overflow'
If given the choice, which path would you take? ASP.NET Webforms + ASP.NET AJAX or ASP.NET MVC + JavaScript Framework of your Choice Are there any limitations that ASP.NET Webforms / ASP.NET AJAX has vis-a-vis MVC? >>> More
ASP.NET MVC 2 Released

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
I’m happy to announce that the final release of ASP.NET MVC 2 is now available for VS 2008/Visual Web Developer 2008 Express with ASP.NET 3.5. You can download and install it from the following locations: Download ASP.NET MVC 2 using the Microsoft Web Platform Installer Download… >>> More

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

as seen on West-Wind - Search for 'West-Wind'
I’ve run into the problem a few times now: How to pre-authenticate .NET WebRequest calls doing an HTTP call to the server – essentially send authentication credentials on the very first request instead of waiting for a server challenge first? At first glance this sound like it should be easy:… >>> More
HttpWebRequest and Ignoring SSL Certificate Errors

as seen on West-Wind - Search for 'West-Wind'
Man I can't believe this. I'm still mucking around with OFX servers and it drives me absolutely crazy how some these servers are just so unbelievably misconfigured. I've recently hit three different 3 major brokerages which fail HTTP validation with bad or corrupt certificates at least according to… >>> More
The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

as seen on West-Wind - Search for 'West-Wind'
I’ve written quite a bit about Visual FoxPro interoperating with .NET in the past both for ASP.NET interacting with Visual FoxPro COM objects as well as Visual FoxPro calling into .NET code via COM Interop. COM Interop with Visual FoxPro has a number of problems but one of them at least got a lot… >>> More
Dynamic Type to do away with Reflection

as seen on West-Wind - Search for 'West-Wind'
The dynamic type in C# 4.0 is a welcome addition to the language. One thing I’ve been doing a lot with it is to remove explicit Reflection code that’s often necessary when you ‘dynamically’ need to walk and object hierarchy. In the past I’ve had a number of ReflectionUtils that used string based expressions… >>> More
Finding a Relative Path in .NET

as seen on West-Wind - Search for 'West-Wind'
Here’s a nice and simple path utility that I’ve needed in a number of applications: I need to find a relative path based on a base path. So if I’m working in a folder called c:\temp\templates\ and I want to find a relative path for c:\temp\templates\subdir\test.txt I want to receive back subdir\test… >>> More