How do I scrape information off ASP.NET websites when paging and JavaScript links are being used?

Posted by Ian Roke on Stack Overflow See other posts from Stack Overflow or by Ian Roke
Published on 2010-03-15T18:01:51Z Indexed on 2010/03/16 14:26 UTC
Read the original article Hit count: 362

Filed under:
|
|
|
|

I have been given a staff list which is supposed to be up to date but it doesn't match an intranet People Finder which is written in ASP.NET.

As the information is sensitive I am not able to access the database the People Finder is using so the only way I can get at the information is by scraping the structure starting at the top brass at the top and then going through each tier in turn.

Each person has a Staff number which then forms the URL http://intranet/peoplefinder/index.aspx?srn=ABC1234 and then all the people who report to them are listed underneth in the format <a id="gvEmployees_ctl03_lnkFullName" href="index.aspx?srn=ABC4321" target="_self"> where each URL indicates the Staff number and provides a link to their team.

The trouble arises when the teams are big as paging is implemented in the GridView with an URL such as <a href="javascript:__doPostBack('gvEmployees','Page$2')">2</a>.

How would I scrape this page, capture the SRN and other details along with the people who report to the person on all pages of the GridView then loop through each reportee and do the same process until the whole list is complete?

© Stack Overflow or respective owner

Related posts about ASP.NET

Related posts about c#