Select all links from a Html table using XPath (and HtmlAgilityPack)

Posted by Adam Asham on Stack Overflow See other posts from Stack Overflow or by Adam Asham
Published on 2010-03-20T22:11:18Z Indexed on 2010/03/20 22:21 UTC
Read the original article Hit count: 644

Filed under:

c#

|

htmlagilitypack

|

xpath

What I am trying to achieve is to extract all links with a href attribute that starts with http://, https:// or /. These links lie within a table (tbody > tr > td etc) with a certain class. I thought I could specify just the the a element without the whole path to it but it does not seem to work. I get a NullReferenceException at the line that selects the links:

var table = doc.DocumentNode.SelectSingleNode("//table[@class='containerTable']");
if (table != null)
{
    foreach (HtmlNode item in table.SelectNodes("a[starts-with(@href, 'https://')]"))
    {
        //not working

I don't know about any recommendations or best practices when it comes to XPath. Do I create overhead when I query the document two times?

© Stack Overflow or respective owner

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

as seen on West-Wind - Search for 'West-Wind'
I’ve run into the problem a few times now: How to pre-authenticate .NET WebRequest calls doing an HTTP call to the server – essentially send authentication credentials on the very first request instead of waiting for a server challenge first? At first glance this sound like it should be easy:… >>> More
HttpWebRequest and Ignoring SSL Certificate Errors

as seen on West-Wind - Search for 'West-Wind'
Man I can't believe this. I'm still mucking around with OFX servers and it drives me absolutely crazy how some these servers are just so unbelievably misconfigured. I've recently hit three different 3 major brokerages which fail HTTP validation with bad or corrupt certificates at least according to… >>> More
The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

as seen on West-Wind - Search for 'West-Wind'
I’ve written quite a bit about Visual FoxPro interoperating with .NET in the past both for ASP.NET interacting with Visual FoxPro COM objects as well as Visual FoxPro calling into .NET code via COM Interop. COM Interop with Visual FoxPro has a number of problems but one of them at least got a lot… >>> More
Dynamic Type to do away with Reflection

as seen on West-Wind - Search for 'West-Wind'
The dynamic type in C# 4.0 is a welcome addition to the language. One thing I’ve been doing a lot with it is to remove explicit Reflection code that’s often necessary when you ‘dynamically’ need to walk and object hierarchy. In the past I’ve had a number of ReflectionUtils that used string based expressions… >>> More
Finding a Relative Path in .NET

as seen on West-Wind - Search for 'West-Wind'
Here’s a nice and simple path utility that I’ve needed in a number of applications: I need to find a relative path based on a base path. So if I’m working in a folder called c:\temp\templates\ and I want to find a relative path for c:\temp\templates\subdir\test.txt I want to receive back subdir\test… >>> More

Related posts about htmlagilitypack

HTMLAgilityPack, HTML duplicate IDs

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi: This is similar to this one here. But needs to be done at the server level rather at the client level. Currently I use HTMLAgilityPack, is there anyway I could detect duplicate IDs? Thanks in advance. >>> More
Select all links from a Html table using XPath (and HtmlAgilityPack)

as seen on Stack Overflow - Search for 'Stack Overflow'
What I am trying to achieve is to extract all links with a href attribute that starts with http://, https:// or /. These links lie within a table (tbody tr td etc) with a certain class. I thought I could specify just the the a element without the whole path to it but it does not seem to work. I… >>> More
HTMLAgilityPack ChildNodes index works, named node does not

as seen on Stack Overflow - Search for 'Stack Overflow'
I am parsing an XML API response with HTMLAgilityPack. I am able to select the result items from the API call. Then I loop through the items and want to write the ChildNodes to a table. When I select ChildNodes by saying something like: sItemId = dnItem.ChildNodes(0).innertext I get the proper… >>> More
Too nervous to install

as seen on Stack Overflow - Search for 'Stack Overflow'
Yesterday I (a professional rugby prop of somewhat limited intellect) landed in http://htmlagilitypack.codeplex.com/ and found myself stranded in a town with no signposts. The locals don't need signposts - they know their way around - so who gives a hoot about visitors? Well I'm a visitor and I'm… >>> More
Screen scraping in C# using HtmlAgilityPack.

as seen on C# Corner - Search for 'C# Corner'
In my example, you can scraping complete page or a part of page. >>> More