Search Results

Search found 68 results on 3 pages for 'htmlagilitypack'.

Page 1/3 | 1 2 3  | Next Page >

  • HtmlAgilityPack giving problems with malformed html

    - by Kapil
    I want to extract meaningful text out of an html document and I was using html-agility-pack for the same. Here is my code: string convertedContent = HttpUtility.HtmlDecode(ConvertHtml(HtmlAgilityPack.HtmlEntity.DeEntitize(htmlAsString))); ConvertHtml: public string ConvertHtml(string html) { HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); StringWriter sw = new StringWriter(); ConvertTo(doc.DocumentNode, sw); sw.Flush(); return sw.ToString(); } ConvertTo: public void ConvertTo(HtmlAgilityPack.HtmlNode node, TextWriter outText) { string html; switch (node.NodeType) { case HtmlAgilityPack.HtmlNodeType.Comment: // don't output comments break; case HtmlAgilityPack.HtmlNodeType.Document: foreach (HtmlNode subnode in node.ChildNodes) { ConvertTo(subnode, outText); } break; case HtmlAgilityPack.HtmlNodeType.Text: // script and style must not be output string parentName = node.ParentNode.Name; if ((parentName == "script") || (parentName == "style")) break; // get text html = ((HtmlTextNode)node).Text; // is it in fact a special closing node output as text? if (HtmlNode.IsOverlappedClosingElement(html)) break; // check the text is meaningful and not a bunch of whitespaces if (html.Trim().Length > 0) { outText.Write(HtmlEntity.DeEntitize(html) + " "); } break; case HtmlAgilityPack.HtmlNodeType.Element: switch (node.Name) { case "p": // treat paragraphs as crlf outText.Write("\r\n"); break; } if (node.HasChildNodes) { foreach (HtmlNode subnode in node.ChildNodes) { ConvertTo(subnode, outText); } } break; } } Now in some cases when the html pages are malformed (for example the following page - http://rareseeds.com/cart/products/Purple_of_Romagna_Artichoke-646-72.html has a malformed meta-tag like <meta content="text/html; charset=uft-8" http-equiv="Content-Type">) [Note "uft" instead of utf] my code is puking at the time I am trying to load the html document. Can someone suggest me how can I overcome these malformed html pages and still extract relevant text out of a html document? Thanks, Kapil

    Read the article

  • HTMLAgilityPack ChildNodes index works, named node does not

    - by XgenX
    I am parsing an XML API response with HTMLAgilityPack. I am able to select the result items from the API call. Then I loop through the items and want to write the ChildNodes to a table. When I select ChildNodes by saying something like: sItemId = dnItem.ChildNodes(0).innertext I get the proper itemId result. But when I try: sItemId = dnItem.ChildNodes("itemId").innertext I get "Referenced object has a value of 'Nothing'." I have tried "itemID[1]", "/itemId[1]" and a veriety of strings. I have tried SelectSingleNode and ChildNodes.Item("itemId").innertext. The only one that has worked is using the index. The problem with using the index is that sometimes child elements are omitted in the results and that throw off the index. Anybody know what I am doing wrong?

    Read the article

  • Select only items in a specific DIV using HtmlAgilityPack

    - by Adam Haile
    I'm trying to use the HtmlAgilityPack to pull all of the links from a page that are contained within a div declared as <div class='content'> However, when I use the code below I simply get ALL links on the entire page. This doesn't really make sense to me since I am calling SelectNodes from the sub-node I selected earlier (which when viewed in the debugger only shows the HTML from that specific div). So, it's like it's going back to the very root node every time I call SelectNodes. The code I use is below: HtmlWeb hw = new HtmlWeb(); HtmlDocument doc = hw.Load(@"http://example.com"); HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='content']"); foreach(HtmlNode link in node.SelectNodes("//a[@href]")) { Console.WriteLine(link.Value); } Is this the expected behavior? And if so, how do I get it to do what I'm expecting?

    Read the article

  • HtmlAgilityPack - Vs 2010 - c# ASP - File Not found

    - by Janosch Geigowskoskilu
    First, I've already searched the web & StackOverflow for hours, and i did find a lot about troubleshooting HtmlAgilityPack and tried most of these but nothing worked. The Situation: I'm developing a C# ASP .NET WebPart in SharePoint Foundation. Everything works fine, now I want to Parse a HTML Page to get all ImagePaths and save the Images on HD/Temp. To do that I was downloading HtmlAgilityPack, current version, add reference to Project, everything looks OK, IntelliSense works fine. The Exception: But when I want to run the section where HtmlAgilityPack should be used my Browser shows me a FileNotFoundException - The File or Assembly could not be found. What I tried: After first searches i tried to include v1.4.0 of HtmlAgilityPack cause I read that the current version in some case is not really stable. This works fine to until the point I want to use HtmlAgilityPack, the same Exception. I also tried moving the HtmlAgilityPack direct to the Solution directory, nothing changed. I tried to insert HtmlAgilityPack via using and I tried direct call e.g. HtmlAgilityPack.HtmlDocument. Conclusion : When I compile no error occurs, the reference is set correct. When I trace the HtmlAgilityPack.dll with ProcMon the Path is shown correct, but sometimes the Result is 'File Locked with only Readers' but I don't know enough about ProcMon to Know what this means or if this is critical. It couldn't have something to do with File Permissions because if I check the DLL the permissions are all given.

    Read the article

  • Select all links from a Html table using XPath (and HtmlAgilityPack)

    - by Adam Asham
    What I am trying to achieve is to extract all links with a href attribute that starts with http://, https:// or /. These links lie within a table (tbody tr td etc) with a certain class. I thought I could specify just the the a element without the whole path to it but it does not seem to work. I get a NullReferenceException at the line that selects the links: var table = doc.DocumentNode.SelectSingleNode("//table[@class='containerTable']"); if (table != null) { foreach (HtmlNode item in table.SelectNodes("a[starts-with(@href, 'https://')]")) { //not working I don't know about any recommendations or best practices when it comes to XPath. Do I create overhead when I query the document two times?

    Read the article

  • XPATH query, HtmlAgilityPack and Extracting Text

    - by Soham
    I had been trying to extract links from a class called "tim_new" . I have been given a solution as well. Both the solution, snippet and necessary information is given here The said XPATH query was "//a[@class='tim_new'], my question is, how did this query differentiate between the first line of the snippet (given in the link above and the second line of the snippet). More specifically, what is the literal translation (in English) of this XPATH query. Furthermore, I want to write a few lines of code to extract the text written against NSE: <div class="FL gL_12 PL10 PT15">BSE: 523395 &nbsp;&nbsp;|&nbsp;&nbsp; NSE: 3MINDIA &nbsp;&nbsp;|&nbsp;&nbsp; ISIN: INE470A01017</div> Would appreciate help in forming the necessary selection query. My code is written as: IEnumerable<string> NSECODE = doc.DocumentNode.SelectSingleNode("//div[@NSE:]"); But this doesnt look right. Would appreciate some help.

    Read the article

  • Extracting table rows with a particular attribute from an HTML File, using HTMLAgilityPack

    - by Soham
    Consider this html snippet: <tr> <td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td> <td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td> <td class="tim_new" align=right valign=top>2,487.25</td> <td class="tim_new" align=right valign=top><font color=#16a903>187.25</font></td> <td class="tim_new" align=right valign=top><font color=#16a903>8.14</font></td> <td class="tim_new" align=right valign=top>2,801.90</td> <td class="tim_new" align=right valign=top>0.06</td> </tr> Realize these three things: The HTML file from which this snippet has been taken, contains multiple number of HTML tables. The table from which this snippet has been extracted doesnt contain only rows of the shown format, but also of other formats like this, for example: <tr><td colspan=7><img src="http://img1.moneycontrol.com/images/blank.gif"height="5"></td></tr>` This same table contains multiple rows of the format which I need to extract. So given this scenario, is it possible to run a code, which extracts, the link with the class name = "tim_new"? Help Appreciated, Soham

    Read the article

  • How would I use HTMLAgilityPack to extract the value I want

    - by Nai
    For the given HTML I want the value of id <div class="name" id="john-5745844"> <div class="name" id="james-6940673"> UPDATE This is what I have at the moment HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.Load(new StringReader(pageResponse)); HtmlNode root = htmlDoc.DocumentNode; List<string> anchorTags = new List<string>(); foreach (HtmlNode div in root.SelectNodes("//div[@class='name' and @id]")) { HtmlAttribute att = div.Attributes["id"]; Console.WriteLine(att.Value); } The error I am getting is at the foreach line stating: Object reference not set to an instance of an object.

    Read the article

  • Too nervous to install

    - by The Prop
    Yesterday I (a professional rugby prop of somewhat limited intellect) landed in http://htmlagilitypack.codeplex.com/ and found myself stranded in a town with no signposts. The locals don't need signposts - they know their way around - so who gives a hoot about visitors? Well I'm a visitor and I'm lost. Here's my plea to the good burgesses of Codeplex-sans-signs: HELP!! Let me back-track and explain what landed me at the bottom of this tangled ruck. There's a "Download" button positioned near the top-right of the Codeplex web page, right? Like the Sword of Damocles, a down-arrow to the left of the button indicates, presumably, what a download would include: CURRENT 1.4.0 Stable DATE Fri May 7 2010 at 7:00 AM STATUS Stable With a simple-minded confidence that has since deserted me (the confidence - not the simple-mindedness), I clicked "Download". This introduced 3 new files to my computer: HtmlAgilityPack.dll, HtmlAgilityPack.pdb, and HtmlAgilityPack.XML This is when the first stab of doubt penetrated that globe between my cauliflower ears that I call a head. Where's the dot cs? Somewhere in Codeplex, I'd read advice to another lost soul to "download and build the HTMLAgilityPack solution". As I've done so many times as an All Black prop, I glared at the opposition front row - ah, I mean the 3 new files. Shouldn't one of them have a ".cs" on the back of his jersey - er, on the end of its name? Or is this just how they play the game in Codeplex-sans-signs? Undaunted (props have more courage than sense) I packed into my first C# scrum. The half-back feeds the ball in, and the front rows collapse - er, the debugging stops at this line of my code: "HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();" Then the Referee blows his whistle and announces one of those verdicts that's utterly indecipherable to your average loose-head prop: Locating source for 'C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'. Checksum: MD5 {62 bc f3 7e 9a 92 a6 32 7 d6 5b f8 76 59 7b 5b} The file 'C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs' does not exist. Looking in script documents for 'C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'... Looking in the projects for 'C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'. The file was not found in a project. Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\vc7\atlmfc'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\vc7\crt'... The debugger will ask the user to find the file: C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs. The user pressed Cancel [a brain-stemmer from the prop] in the Find Source dialog. The debug source files settings for the active solution have been modified so that the debugger will not ask the user to find the file: C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs. The debugger could not locate the source file 'C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'. Even if it had been the first 50 stanzas of "Eskimo Nell", I couldn't have been more shocked. I'm so shocked, my jaws clamp shut around the opposition hooker's ear. He thumbs me in the iris. With a cornea-torn eye I peer at the Codeplex site. My brain stem sparks and I punch the "View all downloads" link. It sparks four more times on each download link, and.. lo! FOUR files this time: HAPExplorer.zip, HtmlAgilityPack.1.4.0.Source.zip, HtmlAgilityPack.1.4.0.zip, HtmlAgilityPack.Documentation.chm But... is this not the same place arrived at recently by my flat-mate Chaz, journalist extraordinaire? (Chaz, if you're reading this, I'm not plugging for nothing - just write kindly about me in your next report, okay?) Didn't these same four files flummox Chaz The Great? He told me about it. Chaz left a message with Codeplex and then solved the problem by just walking away. Typical journalist, huh. But I'm not like that. I don't walk away. I'm made of the sort of stubborn stuff that becomes an All Black prop. Hence this impassioned plea: GOOD TOWNSFOLK OF CODEPLEX-SANS-SIGNS, WHAT SHOULD I DO NEXT? Can somebody point me to Main Street? How does a simpleton install 'C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'? I'm willing to prostrate myself and grovel to the first kind face that passes in front of my rapidly clouding sight. So help me, I'd even tug my forelock if I had one! Should I hold forth my rod over the wilderness, and create a folder called 'C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\' or some such? If so, what files should I move into it? ANYTHING else a dum-ass should know about? - and I mean ANYTHING - you just don't know how witless a punch-drunk prop can be.. %( Whenever I've installed other programs they've given me an ".exe" or ".msi" that I can click on and it's all done for me like magic. HEY... there's nothing of that nature here, is there? Am I missing something? Something for dummies to click? (From the waiting rooms of Dr I. Sight Phixes) (signed) The Prop

    Read the article

  • HTMLAgilitypack getting <P> and <STRONG> text

    - by StealthRT
    Hey all i am looking for a way to get this HTML code: <DIV class=channel_row><SPAN class=channel> <DIV class=logo><IMG src='/images/channel_logos/WGNAMER.png'></DIV> <P><STRONG>2</STRONG><BR>WGNAMER </P></SPAN> using the HtmlAgilityPack. I have been trying this: With channel info!Logo = .SelectSingleNode(".//img").Attributes("src").Value info!Channel = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(0).InnerText info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(2).InnerText End With I can get the Logo but it comes up with a blank string for the Channel and for the Station it says Index was out of range. Must be non-negative and less than the size of the collection. I've tried all types of combinations: info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(1).InnerText info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(3).InnerText info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(0).ChildNodes(1).InnerText info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(0).ChildNodes(2).InnerText info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(0).ChildNodes(3).InnerText What do i need to do in order to correct this?

    Read the article

  • Using HTMLAgility Pack to Extract Links

    - by Soham
    Hi Folks, Consider this simplest piece of code: using System; using System.Collections.Generic; using System.Linq; using System.Text; using HtmlAgilityPack; namespace WebScraper { class Program { static void Main(string[] args) { HtmlDocument doc = new HtmlDocument(); doc.LoadHtml("http://www.google.com"); foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) { } } } } This effectively doesnt do anything at all, and is copied/inspired from various other StackOverflow questions like this. When compiling this, there is a runtime error which says "Object reference not set to an instance of an object." highlighting the foreach line. I can't understand, why the environment has become irritable to this humble,innocent and useless piece of code. I would also like to know, does HTMLAgilityPack accept HTML classes as nodes?

    Read the article

  • HtmlAgilityPack SelectNodes expression to ignore an element with a certain attribute

    - by thaky
    I am trying to select nodes except from script nodes and a ul that has a class called 'relativeNav'. Can someone please direct me to the right path? I have been searching for this for a week and I can't find it anywhere. Currently I have this but it obviously selecting the //ul[@class='relativeNav'] as well. Is there anyway to put an NOT expression of it so that SelectNode will ignore that one? foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//body//*[not(self::script)]/text()")) { Console.WriteLine("Node: " + node); singleString += node.InnerText.Trim() + "\n"; }

    Read the article

  • Split a html string in N parts

    - by Matt Brailsford
    Hi Guys, Does anybody have an example of spliting a html string (coming from a tiny mce editor) and splitting it into N parts using C#? I need to split the string evenly without splitting words. I was thinking of just splitting the html and using the HtmlAgilityPack to try and fix the broken tags. Though I'm not sure how to find the split point, as Ideally it should be based purley on the text rather than the html aswell. Anybody got any ideas on how to go about this? Many thanks Matt

    Read the article

  • XPath query to get node after some other node

    - by czesio
    I am using "HtmlAgilityPack" to parse HTML content. My target is to get number value. <div> some content 1 <br> some <b>content</b> 2 <br> <b>NUMBER:</b> 9788492688647 <br> some content 3 <br> some content 4 </div> aim: - get "9788492688647" Anybody can tell me how to get value between /div/b[2] and <br> ?

    Read the article

  • How could I parse this HTML file?

    - by Sergio Tapia
    <div id="main"> <style type="text/css"> </style> <script language="JavaScript"> </script> <p style="margin: 0pt 0pt 0.5em;"><b>Media from&nbsp;<a onclick="(new Image()).src='/rg/find-media-title/media_strip/images/b.gif?link=/title/tt0087538/';" href="/title/tt0087538/">The Karate Kid</a> (1984)</b></p> <style type="text/css"> </style> <table style="border-collapse: collapse;"> </table> </div> I need to somehow extract the href value of the (new Image()). How exactly would I accomplish this with HtmlAgilityPack? I'm new to it, and so far I haven't found a useful tutorial on how to effectively use it for parsing. Thanks for the help!

    Read the article

  • Parsing tables, cells with Html agility in C#

    - by Kaeso
    I need to parse Html code. More specifically, parse each cell of every rows in all tables. Each row represent a single object and each cell represent different properties. I want to parse these to be able to write an XML file with every data inside (without the useless HTML code). This is the way I thought it out initially but I ran out of ideas: HTML: <tr> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF"> 1 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="left"> <a href="/ice/player.htm?id=8471675">Sidney Crosby</a> </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="center"> PIT </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="center"> C </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 39 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 32 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 33 </td> <td class="statBox sorted" style="border-width:0px 1px 1px 0px; background-color: #E0E0E0" align="right"> <font color="#000000"> 65 </font> </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 20 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 29 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 10 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 1 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 3 </td> <td class="statBox" style="border-width:0px 0px 1px 0px; background-color: #FFFFFF" align="right"> </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 0 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 154 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 20.8 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 21:54 </td> <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 22.6 </td> <td class="statBox" style="border-width:0px 0px 1px 0px; background-color: #FFFFFF" align="right"> 55.7 </td> </tr> C#: using HtmlAgilityPack; using System.Data; namespace Stats { class StatsParser { private string htmlCode; private static string fileName = "[" + DateTime.Now.ToShortDateString() + " NHL Stats].xml"; public StatsParser(string htmlCode) { this.htmlCode = htmlCode; this.ParseHtml(); } public DataTable ParseHtml() { var result = new DataTable(); HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(htmlCode); HtmlNode row = doc.DocumentNode.SelectNodes("//tr"); foreach (var statBox in row.SelectNodes("//td[@class='statBox']")) { System.Windows.MessageBox.Show(statBox.InnerText); } } } }

    Read the article

  • HTML Agility Pack - ReplaceNode doesn't change the InnerHTML of the Body

    - by morsanu
    Hi there, I have this The body: <body><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent leo leo, ultrices eu venenatis et, rutrum fringilla dolor.</p></body> The code: HtmlNode body = doc.DocumentNode.SelectSingleNode("//body"); Dictionary<HtmlNode, HtmlNode> toReplace = new Dictionary<HtmlNode, HtmlNode>(); // I do some logic here adding nodes to the toReplace dictionary. foreach (HtmlNode replaceNode in toReplace.Keys) { replaceNode.ParentNod.ReplaceChild(toReplace[replaceNode], replaceNode); } After i do this, the InnerHtml of the body node remains the same as from beginning, although the OutterHtml or the InnerText are showing the good result. Is there something wrong with my code? The result: // body.InnerHtml <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent leo leo, ultrices eu venenatis et, rutrum fringilla dolor.</p> // body.OutterHtml <body><p>Lorem ipsum dolor sit amet...</p></body>

    Read the article

  • HTML Agility Pack Screen Scraping XPATH isn't returning data

    - by Matthias Welsh
    I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing. The code I'm currently using is pretty quick and dirty... //This function retrieves data from the digikey private static List<string> ExtractProductInfo(HtmlDocument doc) { List<HtmlNode> m_unparsedProductInfoNodes = new List<HtmlNode>(); List<string> m_unparsedProductInfo = new List<string>(); //Base Node for part info string m_baseNode = @"//html[1]/body[1]/div[2]"; //Write part info to list m_unparsedProductInfoNodes.Add(doc.DocumentNode.SelectSingleNode(m_baseNode + @"/table[1]/tr[1]/td[1]/table[1]/tr[1]/td[1]")); //More lines of similar form will go here for more info //this retrieves digikey PN foreach(HtmlNode node in m_unparsedProductInfoNodes) { m_unparsedProductInfo.Add(node.InnerText); } return m_unparsedProductInfo; } Although the path I'm using appears to be "correct" I keep getting NULL when I look at the list "m_unparsedProductInfoNodes" Any idea what's going on here? I'll also add that if I do a "SelectNodes" on the baseNode it only returns a div... not sure what that indicates but it doesn't seem right.

    Read the article

  • How to extract innermost table from html file with the help of the html agility pack ?

    - by Harikrishna
    I am parsing the tabular information from the html file with the help of the html agility pack. Now I can do it and it works. But when the table what I want to extract is inner most. Or I don't know at which position it is in nested tables.And there can be any number of nested tables and from that I want to extract the information of the table which has column name name,address. Ex. <table> <tr><td>PHONE NO.</td><td>OTHER INFO.</td></tr> <tr><td> <table> <tr><td>AMOUNT</td></tr> <tr><td>50000</td></tr> <tr><td>80000</td></tr> </table> </td></tr> <tr><td> <table> <tr><td> <table> <tr><td> <table> <tr><td> NAME </td><td>ADDRESS</td> <tr><td> ABC </td><td> kfks </td> <tr><td> BCD </td><td> fdsa </td> </table> </tr></td> </table> </td></tr> </table> </td></tr> </table> There are many tables but I want to extract the table which has column name name,address. So what should I do ?

    Read the article

  • Can I use Html Agility Pack for this?

    - by chobo2
    Hi I could not find any tutorials on their site. I am wondering can I use Html Agility Pack and use it to parse a string? Like say I have string = "<b>Some code </b> could I use agility pack to get rid of the <b> tags? All the examples I seen so far have been loading like html documents.

    Read the article

1 2 3  | Next Page >