Search Results

Search found 68 results on 3 pages for 'htmlagilitypack'.

Page 2/3 | < Previous Page | 1 2 3 | Next Page >

Can not parse table information from html document.

- by Harikrishna

I am parsing many html documents.I am using html agility pack And I want to parse the tabular information from each document. And there may be any number of tables in each document.But I want to extract only one table from each document which has column header name NAME,PHONE NO,ADDRESS.And this table can be anywhere in the document,like in the document there is ten tables and from ten table there is one table which has many nested tables and from nested table there may be a table what I want to extract means table can be anywhere in the document and I want to find that table from the document by column header name.If I got that table then I want to then extract the information from that table. Now I can find the table which has column header NAME,PHONE NO,ADDRESS and also can extract the information from that.I am doing for that is, first I find the all tables in a document by foreach (var table in doc.DocumentNode.Descendants("table")) then for each table got I find the row for each table like, var rows = table.Descendants("tr"); and then for each row I am checking that row has that header name NAME,ADDRESS,PHONENO and if it is then I skip that row and extract all information after that row foreach (var row in rows.Skip(rowNo)) { var data = new List<string>(); foreach (var column in row.Descendants("td")) { data.Add(properText); } } Such that I am extracting all information from almost many document. But now problem is sometimes what happened that in some document I can not parse the information.Like a document in which there are like 10 tables and from these 10 tables 1 table is like there are many nested tables in that table. And from these nested tables I want to find the table which tabel has column header like NAME,ADDRESS,PHONE NO.So if table may be anywhere in the document even in the nested tables or anywhere it can be find through column header name.So I can parse the information from that table and skip the outer tabular information from that table.

Read the article
Can not parse tabular information from html document.

- by Harikrishna

I am parsing many html documents.I am using html agility pack And I want to parse the tabular information from each document. And there may be any number of tables in each document.But I want to extract only one table from each document which has column header name NAME,PHONE NO,ADDRESS.And this table can be anywhere in the document,like in the document there is ten tables and from ten table there is one table which has many nested tables and from nested table there may be a table what I want to extract means table can be anywhere in the document and I want to find that table from the document by column header name.If I got that table then I want to then extract the information from that table. Now I can find the table which has column header NAME,PHONE NO,ADDRESS and also can extract the information from that.I am doing for that is, first I find the all tables in a document by foreach (var table in doc.DocumentNode.Descendants("table")) then for each table got I find the row for each table like, var rows = table.Descendants("tr"); and then for each row I am checking that row has that header name NAME,ADDRESS,PHONENO and if it is then I skip that row and extract all information after that row foreach (var row in rows.Skip(rowNo)) { var data = new List<string>(); foreach (var column in row.Descendants("td")) { data.Add(properText); } } Such that I am extracting all information from almost many document. But now problem is sometimes what happened that in some document I can not parse the information.Like a document in which there are like 10 tables and from these 10 tables 1 table is like there are many nested tables in that table. And from these nested tables I want to find the table which tabel has column header like NAME,ADDRESS,PHONE NO.So if table may be anywhere in the document even in the nested tables or anywhere it can be find through column header name.So I can parse the information from that table and skip the outer tabular information of that table.

Read the article
Which is the best HTML tidy pack? Is there any option in HTML agility pack to make HTML webpage tidy

- by Harikrishna

I am using html agility pack to parse html tabular information. Now there is some html content with missing ending tags and from such page because of missing ending tags html agility pack does not parse information properly.So I want to insert ending tags where there are missing ending tags so html agility pack parse information properly. So to insert the missing ending tags what should I do ?Should I do write my own code for that or use html tidy pack to do that ? If html tidy pack then which is the best html tidy pack,and how to use it any example if possible ? And if my own code than what it can be like ? Is there any option in html agility pack which can make us able to first make the html page tidy and then parse the webpage.

Read the article
Option In The Html Agility Pack That Parse From The Tag `&lt table &lt`

- by Harikrishna

Is there any option in the html agility pack that can parse the tag which is like in the &lt and &gt. If there is tag like <table> then html agility pack parse the information from the tag table properly.But if the tag is like &lt table &lt then it does not parse the information from the tag table here. So any option is there in the html agility pack that parse information from such tags also.

Read the article
Extracting a table row with a particular attribute,using HTMLAGILITY pack

- by Soham

Consider this piece of code: <tr> <td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td> <td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td> I want to write a piece of code using HTMLAgility pack which would extract the link in the first line.

Read the article
Html Agility Pack: make code look neat

- by illdev

Can I use Html Agility Pack to make the output look nicely indented, unnecessary white space stripped?

Read the article
Selecting an element based on text and attribute of its sibling, using Xpath

- by Adam Asham

Looking at the document, the goal is to select the second cell from the second row, in the first table. I've created the following expression: //row/td[2]/text()[td[@class="identifier"]/span[text()="identifier"]] but it does not return any rows. Unfortunately I do not see what's wrong. To me, it looks alright. The expression should: select the text in the second cell in any row where the text of a span equals to "identifier" and the span is located in cell with a "identifier" class I'd appreciate it if you could point out what I'm doing wrong. Sample XML document: <?xml version="1.0"?> <html> <table class="first"> <tr> <td>row 1, cell 1</td> <td>row 1, cell 2</td> </tr> <tr> <td class="identifier"> <span>identifier</span> </td> <td> foo <span>ignore</span> bar </td> </tr> <tr> <td>row 3, cell 1</td> <td>row 3, cell 2</td> </tr> </table> <table class="second"> <tr> <td>row 1, cell 1</td> <td>row 1, cell 2</td> </tr> <tr> <td class="identifier"> <span>not an identifier</span> </td> <td> not a target </td> </tr> <tr> <td>row 3, cell 1</td> <td>row 3, cell 2</td> </tr> </table> </html>

Read the article
How can I get all content within <td> tag using a HTML Agility Pack?

- by Bob Dylan

So I'm writing an application that will do a little screen scrapping. I'm using the HTML Agility Pack to load an entire HTML page into an instance of HtmlDocoument called doc. Now I want to parse that doc, looking for this: <table border="0" cellspacing="3"> <tr><td>First rows stuff</td></tr> <tr> <td> The data I want is in here <br /> and it's seperated by these annoying <br /> 's. No id's, classes, or even a single <p> tag. </p> Just a bunch of <br /> tags. </td> </tr> </table> So I just need to get the data within the 2nd row. How can I do this? Should I use a regex or something else?

Read the article
How to parse date from html page using html agility pack ?

- by Harikrishna

I have html pages and I am parsing those pages with html agility pack. Now I want to parse some information.In every pages there is trading date(20/02/02) which I want to parse. Like it will be look like a Trading date : 20/02/02. Now Trading date and date(20/02/02) may be in same column(td) or it can be different column like in first column trading date and in second column 20/02/02 then what should I do ?

Read the article
How do i get direct Descendants with html agility pack

- by acidzombie24

I have a specific html node and i want to get the 2nd aka last direct descendant. So after writing .Descendants("div") i wrote ls.Last(). I actually got the last div in the 2nd descendant. Not what i am expecting. How do i get only the direct descendants? or how do i get the descendant with a specific classname? because "div.postBody" would be a suitable alternative.

Read the article
how to get innerTextwithin the node in HTML AGILity pack..?

- by Shashi

<a> contents <strong>strong content</strong> </a> I want a only the "contents" i.e present between <a> and <strong>

Read the article
Does the HTML Agility Pack contain unmanaged code? If so, will I encounter problems in my applicati

- by Harikrishna

Does the HTML Agility Pack contain unmanaged code? If so, will I see any problems when using unmanaged code in my application?

Read the article
Does the HTML Agility Pack contain unmanaged code?

- by Harikrishna

Does the HTML Agility Pack contain unmanaged code?

Read the article
How does this XPATH query differentiate?

- by Soham

I am kind of repeating this question because mostly due to my own ignorance, I could not fully understand the innards. Given this HTML snippet <td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td> <td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td> How does this XPATH //a[@class='tim_new'] differentiate between line 1 and line 2.

Read the article
Html Agility Pack: Setting an HtmlNode's Attribute Value isn't reflected in the HtmlDocument.

- by Avi

In Html Agility Pack, when I set an attribute of an HtmlNode, should I see this in the HtmlDocument from which the node was selected? Lets say that htmlDocument is an HtmlDocument. So the simplified code looks like this: HtmlNode documentNode = htmlDocument.DocumentNode; HtmlNodeCollection nodeCollection = documentNode.SelectNodes(someXPath); foreach(var node in nodeCollection) if(SomeCondition(node)) node.SetAttributeValue("class","something"); Now, I see the class attribte of node change, but I don't see this change reflected in the htmlDocument's html.

Read the article
Parsing Tabular cell data with space where there is td tag.

- by Harikrishna

I am parsing html tabular information with the help of the html agility pack. Now First I am finding the rows in that table like var rows = table.Descendants("tr"); then I find the cell data for each row like foreach(var row in rows) { string rowInnerText = row.InnerText; } That gives me the cell data.But with no spaces between them like NameAdressPhone No but I want the innertext like Name Address Phone No means where there is td tag I want to keep there one space between different column cell.

Read the article
Is there any inbuilt support or native library in the .net for parsing html file ?

- by Harikrishna

Why html agility pack is used to parse the information from the html file ? Is not there inbuilt or native library in the .net to parse the information from the html file ? If there then what is the problem with inbuilt support ? What the benefits of using html agility pack versus inbuilt support for parsing information from the html file ?

Read the article
Get Links in class with html agility pack

- by acidzombie24

There are a bunch of tr's with the class alt. I want to get all the links (or the first of last) yet i cant figure out how with html agility pack. I tried variants of a but i only get all the links or none. It doesnt seem to only get the one in the node which makes no sense since i am writing n.SelectNodes html.LoadHtml(page); var nS = html.DocumentNode.SelectNodes("//tr[@class='alt']"); foreach (var n in nS) { var aS = n.SelectNodes("a"); ... }

Read the article
Improve heavy work in a loop in multithreading

- by xjaphx

I have a little problem with my data processing. public void ParseDetails() { for (int i = 0; i < mListAppInfo.Count; ++i) { ParseOneDetail(i); } } For 300 records, it usually takes around 13-15 minutes. I've tried to improve by using Parallel.For() but it always stop at some point. public void ParseDetails() { Parallel.For(0, mListAppInfo.Count, i => ParseOneDetail(i)); } In method ParseOneDetail(int index), I set an output log for tracking the record id which is under processing. Always hang at some point, I don't know why.. ParseOneDetail(): 89 ... ParseOneDetail(): 90 ... ParseOneDetail(): 243 ... ParseOneDetail(): 92 ... ParseOneDetail(): 244 ... ParseOneDetail(): 93 ... ParseOneDetail(): 245 ... ParseOneDetail(): 247 ... ParseOneDetail(): 94 ... ParseOneDetail(): 248 ... ParseOneDetail(): 95 ... ParseOneDetail(): 99 ... ParseOneDetail(): 249 ... ParseOneDetail(): 100 ... _ <hang at this point> Appreciate your help and suggestions to improve this. Thank you! Edit 1: update for method: private void ParseOneDetail(int index) { Console.WriteLine("ParseOneDetail(): " + index + " ... "); ApplicationInfo appInfo = mListAppInfo[index]; var htmlWeb = new HtmlWeb(); var document = htmlWeb.Load(appInfo.AppAnnieURL); // get first one only HtmlNode nodeStoreURL = document.DocumentNode.SelectSingleNode(Constants.XPATH_FIRST); appInfo.StoreURL = nodeStoreURL.Attributes[Constants.HREF].Value; } Edit 2: This is the error output after a while running as Enigmativity suggest, ParseOneDetail(): 234 ... ParseOneDetail(): 87 ... ParseOneDetail(): 235 ... ParseOneDetail(): 236 ... ParseOneDetail(): 88 ... ParseOneDetail(): 238 ... ParseOneDetail(): 89 ... ParseOneDetail(): 90 ... ParseOneDetail(): 239 ... ParseOneDetail(): 92 ... Unhandled Exception: System.AggregateException: One or more errors occurred. --- > System.Net.WebException: The operation has timed out at System.Net.HttpWebRequest.GetResponse() at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocum ent doc, IWebProxy proxy, ICredentials creds) in D:\Source\htmlagilitypack.new\T runk\HtmlAgilityPack\HtmlWeb.cs:line 1355 at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, Ne tworkCredential creds) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\Ht mlWeb.cs:line 1479 at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in D:\Source\htmla gilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1103 at HtmlAgilityPack.HtmlWeb.Load(String url) in D:\Source\htmlagilitypack.new\ Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1061 at SimpleChartParser.AppAnnieParser.ParseOneDetail(ApplicationInfo appInfo) i n c:\users\nhn60\documents\visual studio 2010\Projects\FunToolPack\SimpleChartPa rser\AppAnnieParser.cs:line 90 at SimpleChartParser.AppAnnieParser.<ParseDetails>b__0(ApplicationInfo ai) in c:\users\nhn60\documents\visual studio 2010\Projects\FunToolPack\SimpleChartPar ser\AppAnnieParser.cs:line 80 at System.Threading.Tasks.Parallel.<>c__DisplayClass21`2.<ForEachWorker>b__17 (Int32 i) at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c() at System.Threading.Tasks.Task.InnerInvoke() at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask) at System.Threading.Tasks.Task.<>c__DisplayClass7.<ExecuteSelfReplicating>b__ 6(Object ) --- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceled Exceptions) at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationTo ken cancellationToken) at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int 32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWit hState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally) at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](TSource[] ar ray, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Act ion`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEveryt hing, Func`1 localInit, Action`1 localFinally) at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable` 1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState , Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithE verything, Func`1 localInit, Action`1 localFinally) at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, Act ion`1 body) at SimpleChartParser.AppAnnieParser.ParseDetails() in c:\users\nhn60\document s\visual studio 2010\Projects\FunToolPack\SimpleChartParser\AppAnnieParser.cs:li ne 80 at SimpleChartParser.Program.Main(String[] args) in c:\users\nhn60\documents\ visual studio 2010\Projects\FunToolPack\SimpleChartParser\Program.cs:line 15

Read the article
Is it kosher for me to use HTMLAgilityPack in my free open source C# library?

- by Sergio Tapia

I'm going to make a movie site scraping library that's free and open source. I want to use HTMLAgilityPack to easily parse web information from HTML source code, but I'm not sure if I legally can? Can I use this library in this way? Thank you.

Read the article
How to remove the <br> tag in my html string using HtmlAgilityPack in C#?

- by Saravanan

I have one HTML string and i am using HtmlAgilityPack for parsing html string. This is my html string <p class="Normal-P" style="direction: ltr; unicode-bidi: normal;"><span class="Normal-H">sample<br/></span> <span class="Normal-H">texting<br></span></p> This HTML string has <br> tag in two places.So, i want to remove both the tags... can you help me to remove all <br> tags in my html string...

Read the article
Repeating a object that only occurs couple of times and has different values with htmlagilitypack c#.

- by dtd

I have a problem I cant seem to solve here. Lets say I have some html like beneth here that I want to parse. All this html is within one list on the page. And the names repeat themself like in the example I wrote. <li class = "seperator"> a date </li> <li class = "lol"> some text </li> <li class = "lol"> some text </li> <li class = "lol"> some text </li> <li class = "seperator"> a new date </li> <li class = "lol"> some text </li> <li class = "seperator"> a nother new date </li> <li class = "lol"> some text </li> <li class = "lol"> some text </li> I did manage to use htmlagility pack to parse every li object seperate, and almost formating it how I want. My print atm looks something like this: "a date" "some text" "some text" "some text" "some text" "a new date" "some text" "a nother new date " "some text" "some text" "some text" What I want to achive: "a date" "some text" "a date" "some text" "a date" "some text" "a date" "some text" "a new date" "some text" "a nother new date " "some text" "a nother new date " "some text" "a nother new date " "some text" But the problem is that beneath every seperator, the count of every lol object may vary. So one day, the webpage may have one lol object beneth date 1, and the next day it may have 10 lol objects. So I am woundering if there is an smart/easy way to somehow count the number of lol objects in between the seperators. Or if there is another way to figure this out? Within for example htmlagilitypack. And yes, I need the correct date in front of every lol object, not just infront the first one. This would have been a pice of cake if the seperator class would have ended beneath the last lol object, but sadly that is not the case... I dont think that I need to paste my code here, but basicly what I do is to parse the page, extract the seperators and lol objects and add them to a list, where I split them up to seperator and lol objects. Then I print it out to a file and since the seperator only occure 3 times(in the example) I will only get out 3 seperate dates.

Read the article
TF2010 Build Definition and Access to Path is Denied error?

- by Daniel DiVita

I am new to TFS with regards to build definitions. I have a a build folder setup where I have set the permissons so EVERYONE has full control. Here is the exact error I am getting: E:\Builds\PIMSite\PIM.Site\PIM_Site.metaproj: Unable to copy file "C:\Builds\1\PIM System\PIM Site Build\Binaries\HtmlAgilityPack.xml" to "..\PIM.Site\Bin\HtmlAgilityPack.xml". Access to the path '..\PIM.Site\Bin\HtmlAgilityPack.xml' is denied. I have tried everyhitng. I have removed everything from that folder adn can delete it just fine so it is not being used by another process. Any thoughts?

Read the article
Will I use HtmlDocument even I want to parse the HTML string using HtmlAglityPack ?

- by skhan

Hi everyone, I'm working in C#. I'm trying to extract the first instance of img tag from a HTML string (which is actually a post data). This is my code: private string GrabImage(string htmlContent) { String firstImage; HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc.LoadHtml(htmlContent); HtmlAgilityPack.HtmlNode imageNode = htmlDoc.DocumentNode.SelectSingleNode("//img"); if (imageNode != null) { return firstImage = imageNode.ToString(); } else return firstImage=" "; } But it gets null in htmlDoc, will I use the HtmlDocument type even if I'm trying to parse the HTML from a string ? P.S btw is it the correct way of grabbing the first instance of image tag from my HTML string?

Read the article
Parsing HTML tags to find a specific Table Row

- by moutonc

Hello everyone I was set up with a Challenge where I must parse through an HTML page to find the end date of all the classes. I am using the HTMLAgilityPack but, this is the first time I have used it, also the webpage who ever set it up has no classes or Id's and the end dates are stored in a Tr H4 tag. I am not sure how to Parse through any hits? My Code: HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.Load(txtURL.Text); sw.WriteLine("GET /academics/academic-calendar/ HTTP/1.1"); sw.WriteLine(); String response = sr.ReadToEnd(); txtHTML.Text = response;

Read the article

< Previous Page | 1 2 3 | Next Page >