Using HTMLAgility Pack to Extract Links

Posted by Soham on Stack Overflow See other posts from Stack Overflow or by Soham
Published on 2010-06-05T11:15:31Z Indexed on 2010/06/05 11:22 UTC
Read the original article Hit count: 627

Filed under:

c#

|

web

|

programming

|

htmlagilitypack

Hi Folks, Consider this simplest piece of code:

    using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace WebScraper
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml("http://www.google.com");

            foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
            {
            }
        }
    }
}

This effectively doesnt do anything at all, and is copied/inspired from various other StackOverflow questions like this. When compiling this, there is a runtime error which says "Object reference not set to an instance of an object." highlighting the foreach line.

I can't understand, why the environment has become irritable to this humble,innocent and useless piece of code.

I would also like to know, does HTMLAgilityPack accept HTML classes as nodes?

© Stack Overflow or respective owner

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

as seen on West-Wind - Search for 'West-Wind'
I’ve run into the problem a few times now: How to pre-authenticate .NET WebRequest calls doing an HTTP call to the server – essentially send authentication credentials on the very first request instead of waiting for a server challenge first? At first glance this sound like it should be easy:… >>> More
HttpWebRequest and Ignoring SSL Certificate Errors

as seen on West-Wind - Search for 'West-Wind'
Man I can't believe this. I'm still mucking around with OFX servers and it drives me absolutely crazy how some these servers are just so unbelievably misconfigured. I've recently hit three different 3 major brokerages which fail HTTP validation with bad or corrupt certificates at least according to… >>> More
The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

as seen on West-Wind - Search for 'West-Wind'
I’ve written quite a bit about Visual FoxPro interoperating with .NET in the past both for ASP.NET interacting with Visual FoxPro COM objects as well as Visual FoxPro calling into .NET code via COM Interop. COM Interop with Visual FoxPro has a number of problems but one of them at least got a lot… >>> More
Dynamic Type to do away with Reflection

as seen on West-Wind - Search for 'West-Wind'
The dynamic type in C# 4.0 is a welcome addition to the language. One thing I’ve been doing a lot with it is to remove explicit Reflection code that’s often necessary when you ‘dynamically’ need to walk and object hierarchy. In the past I’ve had a number of ReflectionUtils that used string based expressions… >>> More
Finding a Relative Path in .NET

as seen on West-Wind - Search for 'West-Wind'
Here’s a nice and simple path utility that I’ve needed in a number of applications: I need to find a relative path based on a base path. So if I’m working in a folder called c:\temp\templates\ and I want to find a relative path for c:\temp\templates\subdir\test.txt I want to receive back subdir\test… >>> More

Related posts about web

Why is Java EE 6 better than Spring ?

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Java EE 6 was released over 2 years ago and now there are 14 compliant application servers. In all my talks around the world, a question that is frequently asked is Why should I use Java EE 6 instead of Spring ? There are already several blogs covering that topic: Java EE… >>> More
Hosting a website on Heroku.... I know how to, but im running into problems!

as seen on Pro Webmasters - Search for 'Pro Webmasters'
I'm starting to learn more on the back-end scale of programing. Recently I started up Heroku for the second or third time. This time I actually installed the Git update to my Mac and installed Heroku in the terminal. I wanted to upload a static html site with the sinatra gem. Everything worked out… >>> More
Microsoft .NET Web Programming: Web Sites versus Web Applications

as seen on Samir ASP.NET with C# Technology - Search for 'Samir ASP.NET with C# Technology'
In .NET 2.0, Microsoft introduced the Web Site. This was the default way to create a web Project in Visual Studio 2005. In Visual Studio 2008, the Web Application has been restored as the default web Project in Visual Studio/.NET 3.x The Web Site is a file/folder based Project structure. It… >>> More
VS2008 - Unable to Add Web Reference to Web Application Project (The web services enumeration compon

as seen on Stack Overflow - Search for 'Stack Overflow'
I've run into a situation where I was unable to add a Web Reference in Visual Studio 2008 to a Web Application Project. The error I couldn't resolve was "The web services enumeration components are not available. You need to reinstall Visual Studio to add web references to your application." How… >>> More
Outlook Web Access: "Outlook Web Access has encountered a Web browsing error"

as seen on Super User - Search for 'Super User'
When one of my colleagues is accessing Outlook Web Access from IE, he frequently gets an error reported: "Outlook Web Access has encountered a Web browsing error". The error report includes the following: Client Information User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4… >>> More