Getting a "summary" of a webpage

Posted by MattiasK on Stack Overflow See other posts from Stack Overflow or by MattiasK
Published on 2010-05-31T05:10:59Z Indexed on 2010/05/31 5:22 UTC
Read the original article Hit count: 193

Filed under:
|
|

I have something of a a hairy problem, I'd like to generate a couple of paragraphs of "description" of a given url, normally the start of an article. The Meta description field is one way to go but it isn't always good or set properly.

It's fair to say it's a bit problematic to accomplish this from the screenscraped HTML. I had a general idea that perhaps one could scan the HTML for the first "appropriate" segment but it's hard to say what that is, perhaps something like the first paragraph containing a certain amount of text...

Anyone have any good ideas? :) It doesn't have to be foolproof

© Stack Overflow or respective owner

Related posts about c#

Related posts about html