C# RegEx - find html tags (div and anchor)

Posted by czesio on Stack Overflow See other posts from Stack Overflow or by czesio
Published on 2010-04-06T14:00:52Z Indexed on 2010/04/06 14:03 UTC
Read the original article Hit count: 1031

Filed under:
|
|
|
|

Hi

I have to retrieve several div section (of specific class name "row ") with it's content, and additionally find all anchor tags (link urls) (with class "underline red bold"). Shortly speaing : get section of:

... (divs, tags ...)

and collections of urls

string[] urls = {"/searchClickThru? pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p"}

the entire page looks like that:

<html>

... a lot of stuff

<div class="row ">

  <div class="photo">
    <a rel="nofollow" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
      <img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f0607827.jpg">                 
 </a>
  </div>

  <div class="desc">
    <div class="l1">
      <div class="icons">
      </div>

      <table cellspacing="0" cellpadding="0" border="0">
        <tbody>
          <tr>
            <td>
              <div class="fleft">
                <a class="underline red bold" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
                  Culture And Gender   <br>Intimate Relation</a>
              </div>

              <div class="fleft">

              </div>
            </td>
          </tr>
        </tbody>
      </table>
    </div>
    <div class="l2">

      <div>
      </div>
      <div>
        <div class="but">
        </div>
      </div>
    </div>
    <div class="l3">
      Long description
      <a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
        more<img alt="" src="/b/img/arr_red_sm.gif">
  </a>
    </div>
  </div>
</div>

<div class="omit"></div>

<div class="row ">

  <div class="photo">
    <a rel="nofollow" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534899,p">
      <img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f06078222.jpg">                    
 </a>
  </div>

  <div class="desc">
    <div class="l1">
      <div class="icons">
      </div>

      <table cellspacing="0" cellpadding="0" border="0">
        <tbody>
          <tr>
            <td>
              <div class="fleft">
                <a class="underline red bold" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod5653489225,p">
                  Culture And Gender   <br>Intimate Relation</a>
              </div>

              <div class="fleft">

              </div>
            </td>
          </tr>
        </tbody>
      </table>
    </div>
    <div class="l2">

      <div>
      </div>
      <div>
        <div class="but">
        </div>
      </div>
    </div>
    <div class="l3">
      Long description
      <a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&amp;q=&amp;rpos=109181&amp;rpp=10&amp;_dyncharset=UTF-8&amp;sort=&amp;url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
        more<img alt="" src="/b/img/arr_red_sm.gif">
  </a>
    </div>
  </div>
</div>

Can anybody help me to create suitable reg ex?

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex