HTML Agility Pack Screen Scraping XPATH isn't returning data

Posted by Matthias Welsh on Stack Overflow See other posts from Stack Overflow or by Matthias Welsh
Published on 2010-03-23T13:00:03Z Indexed on 2010/03/23 13:03 UTC
Read the original article Hit count: 935

Filed under:

I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing.

The code I'm currently using is pretty quick and dirty...

   //This function retrieves data from the digikey
   private static List<string> ExtractProductInfo(HtmlDocument doc)
   {
       List<HtmlNode> m_unparsedProductInfoNodes = new List<HtmlNode>();
       List<string> m_unparsedProductInfo = new List<string>();

       //Base Node for part info
       string m_baseNode = @"//html[1]/body[1]/div[2]";

       //Write part info to list
       m_unparsedProductInfoNodes.Add(doc.DocumentNode.SelectSingleNode(m_baseNode + @"/table[1]/tr[1]/td[1]/table[1]/tr[1]/td[1]"));
       //More lines of similar form will go here for more info
       //this retrieves digikey PN

       foreach(HtmlNode node in m_unparsedProductInfoNodes)
       {
           m_unparsedProductInfo.Add(node.InnerText);
       }

       return m_unparsedProductInfo;
   }

Although the path I'm using appears to be "correct" I keep getting NULL when I look at the list "m_unparsedProductInfoNodes"

Any idea what's going on here? I'll also add that if I do a "SelectNodes" on the baseNode it only returns a div... not sure what that indicates but it doesn't seem right.

Developer IT

HTML Agility Pack Screen Scraping XPATH isn't returning data - Developer IT

HTML Agility Pack Screen Scraping XPATH isn't returning data

htmlagilitypack

c#

webscraping

screen-scraping

Related posts about htmlagilitypack

HTMLAgilityPack, HTML duplicate IDs

Select all links from a Html table using XPath (and HtmlAgilityPack)

HTMLAgilityPack ChildNodes index works, named node does not

Too nervous to install

Screen scraping in C# using HtmlAgilityPack.

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

HttpWebRequest and Ignoring SSL Certificate Errors

The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

Dynamic Type to do away with Reflection

Finding a Relative Path in .NET

Categories cloud