Search Results

Search found 10 results on 1 pages for 'html5lib'.

Page 1/1 | 1

Which revision of html5lib is stable?

- by Mat

html5lib notes that it's latest release (0.11) is somewhat old. Using the Python portion, I have recursion problems as noted in Issue 70 and Issue 59 but can't find a recent Mercurial revision that is stable. The latest tip is no good, I got the following error from python setup.py install: byte-compiling build/bdist.linux-x86_64/egg/html5lib/treewalkers/_base.py to _base.pyc File "build/bdist.linux-x86_64/egg/html5lib/treewalkers/_base.py", line 40 "data": []} ^ SyntaxError: invalid syntax And I get the following errors at runtime: soup = parser.parse(page.read()) File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 165, in parse File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 144, in _parse File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 454, in processDoctype TypeError: insertDoctype() takes exactly 4 arguments (2 given) I'm using it on Python 2.5.2 with lxml and BeautifulSoup.

Read the article
Skip sanitization for videos in html5lib

- by pug

I am using a wmd-editor in django, much like this one in which I am typing. I would like to allow the users to embed videos in it. For that I am using the Markdown video extension here. The problem is that I am also sanitizing user input using html5lib sanitization and it doesn't allow object tags which are required to embed the videos. One solution could be to check the input for urls of well-known video sites and skip the sanitization in those cases. Is there a better solution?

Read the article
Need to parse HTML document for links-- use a library like html5lib or something else?

- by Luinithil

I'm a very newbie webpage builder, currently working on creating a website that needs to change link colours according to the destination page. The links will be sorted into different classes (e.g. good, bad, neutral) by certain user input criteria-- e.g. links with content the user would find of interest is colored blue, stuff that the user (presumably) doesn't want to see is colored as normal text, etc. I reckon I need a way to parse the webpage for links to the content (stored in MySQL database), change the colors for all the links on the page (so I need to be able to change the link classes in the HTML as well) before outputting the adapted page to the user. I read that regex is not a good way to find those links-- so should I use a library, and if so, is html5lib good for what I'm doing?

Read the article
Parse html and find data in the html

- by Dan.StackOverflow

Hi all. I am trying to use html5lib to parse an html page in to something I can query with xpath. html5lib has close to zero documentation and I've spent too much time trying to figure this problem out. Ultimate goal is to pull out the second row of a table: <html> <table> <tr><td>Header</td></tr> <tr><td>Want This</td></tr> </table> </html> so lets try it: >>> doc = html5lib.parse('<html><table><tr><td>Header</td></tr><tr><td>Want This</td> </tr></table></html>', treebuilder='lxml') >>> doc <lxml.etree._ElementTree object at 0x1a1c290> that looks good, lets see what else we have: >>> root = doc.getroot() >>> print(lxml.etree.tostring(root)) <html:html xmlns:html="http://www.w3.org/1999/xhtml"><html:head/><html:body><html:table><html:tbody><html:tr><html:td>Header</html:td></html:tr><html:tr><html:td>Want This</html:td></html:tr></html:tbody></html:table></html:body></html:html> LOL WUT? seriously. I was planning on using some xpath to get at the data I want, but that doesn't seem to work. So what can I do? I am willing to try different libraries and approaches.

Read the article
Parsing with BeautifulSoup, error message TypeError: coercing to Unicode: need string or buffer, NoneType found

- by Samsun Knight

so I'm trying to scrape an Amazon page for data, and I'm getting an error when I try to parse for where the seller is located. Here's my code: #getting the html request = urllib2.Request('http://www.amazon.com/gp/offer-listing/0393934241/') opener = urllib2.build_opener() #hiding that I'm a webscraper request.add_header('User-Agent', 'Mozilla/5 (Solaris 10) Gecko') #opening it up, putting into soup form html = opener.open(request).read() soup = BeautifulSoup(html, "html5lib") #parsing for the seller info sellers = soup.findAll('div', {'class' : 'a-row a-spacing-medium olpOffer'}) for eachseller in sellers: #parsing for price price = eachseller.find('span', {'class' : 'a-size-large a-color-price olpOfferPrice a-text-bold'}) #parsing for shipping costs shippingprice = eachseller.find('span' , {'class' : 'olpShippingPrice'}) #parsing for condition condition = eachseller.find('span', {'class' : 'a-size-medium'}) #parsing for seller name sellername = eachseller.find('b') #parsing for seller location location = eachseller.find('div', {'class' : 'olpAvailability'}) #printing it all out print "price, " + price.string + ", shipping price, " + shippingprice.string + ", condition," + condition.string + ", seller name, " + sellername.string + ", location, " + location.string I get the error message, pertaining to the 'print' command at the end, "TypeError: coercing to Unicode: need string or buffer, NoneType found" I know that it's coming from this line - location = eachseller.find('div', {'class' : 'olpAvailability'}) - because the code works fine without that line, and I know that I'm getting NoneType because the line isn't finding anything. Here's the html from the section I'm looking to parse: <*div class="olpAvailability"> In Stock. Ships from WI, United States. <*br/><*a href="/gp/aag/details/ref=olp_merch_ship_9/175-0430757-3801038?ie=UTF8&asin=0393934241&seller=A1W2IX7T37FAMZ&sshmPath=shipping-rates#aag_shipping">Domestic shipping rates</a> and <*a href="/gp/aag/details/ref=olp_merch_return_9/175-0430757-3801038?ie=UTF8&asin=0393934241&seller=A1W2IX7T37FAMZ&sshmPath=returns#aag_returns">return policy</a>. <*/div> (but without the stars - just making sure the HTML doesn't compile out of code form) I don't see what's the problem with the 'location' line of code, or why it's not pulling the data I want. Help?

Read the article
Not able to show video with html5

- by shin

I am testing html 5 video tag. I am using http://www.kaltura.org/project/HTML5_Video_Media_JavaScript_Library and http://camendesign.co.uk/. I downloaded the creative common video. When I use an external link, it plays the video. So I uploaded the video to my server but it does not play. It asks if I want to save it or asking an application to play. When I go to the external link, http://cdn.kaltura.org/apis/html5lib/kplayer-examples/media/bbb400p.ogv, it plays it on the browser automatically. I also tested locally, but it does not play either. I am hoping someone gives me why and how to solve the problem. This code works. <figure> <video id="vid1" width="500" height="300" style="position:absolute" poster="http://cdn.kaltura.org/apis/html5lib/kplayer-examples/media/bbb480.jpg" durationHint="33" controls = "true"> <source src="http://cdn.kaltura.org/apis/html5lib/kplayer-examples/media/bbb400p.ogv" /> <source src="http://cdn.kaltura.org/apis/html5lib/kplayer-examples/media/bbb_trailer_iphone.m4v"/> </video> </figure> This does not. <figure> <video id="vid1" width="500" height="300" style="position:absolute" poster="http://cdn.kaltura.org/apis/html5lib/kplayer-examples/media/bbb480.jpg" durationHint="33" controls = "true"> <source src="http://www.mywebsite.com/media/bbb400p.ogv" /> <source src="http://www.mywebsite.com/media/bbb_trailer_iphone.m4v"/> </video> </figure> This does not work either. <figure> <video id="vid1" width="500" height="300" style="position:absolute" poster="http://cdn.kaltura.org/apis/html5lib/kplayer-examples/media/bbb480.jpg" durationHint="33" controls = "true"> <source src="http://127.0.0.1/html5videotest/media/bbb400p.ogv" /> <source src="http://127.0.0.1/html5videotest/media/bbb_trailer_iphone.m4v"/> </video> </figure>

Read the article
Need a host which supports OSQA

- by Josip Gòdly Zirdum

Hi i'm looking to install OSQA and see how it goes I have a great niche which I think may work real well, but till I get a large enough audience I'd like to use shared hosting then move up to a dedicated or vps hosting... Almost all hosts i've looked at don't support something OSQA needs I need relatively cheap shared hosting with cpanel. Any recommendations? It needs to support: Django Python markdown html5lib Python OpenId South

Read the article
BeautifulSoup HTMLParseError. What's wrong with this?

- by user1915496

This is my code: from bs4 import BeautifulSoup as BS import urllib2 url = "http://services.runescape.com/m=news/recruit-a-friend-for-free-membership-and-xp" res = urllib2.urlopen(url) soup = BS(res.read()) other_content = soup.find_all('div',{'class':'Content'})[0] print other_content Yet an error comes up: /Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py:149: RuntimeWarning: Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help. "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.")) Traceback (most recent call last): File "web.py", line 5, in <module> soup = BS(res.read()) File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in __init__ self._feed() File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in _feed self.builder.feed(self.markup) File "/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py", line 150, in feed raise e I've let two other people use this code, and it works for them perfectly fine. Why is it not working for me? I have bs4 installed...

Read the article
Best library to parse HTML with Python 3 and example?

- by TMC

I'm new to Python completely and am using Python 3.1 on Windows (pywin). I need to parse some HTML, to essentially extra values between specific HTML tags and am confused at my array of options, and everything I find is suited for Python 2.x. I've read raves about Beautiful Soup, HTML5Lib and lxml, but I cannot figure out how to install any of these on Windows. Questions: What HTML parser do you recommend? How do I install it? Do you have a simple example on how to use the recommended library to snag HTML from a specific URL and return the value out of say something like this: fooLink (say we want to return "/blahblah")

Read the article
BeautifulSoup can't parse a webpage?

- by JLTChiu

I am using beautiful soup for parsing webpage now, I've heard it's very famous and good, but it doesn't seems works properly. Here's what I did import urllib2 from bs4 import BeautifulSoup page = urllib2.urlopen("http://www.cnn.com/2012/10/14/us/skydiver-record-attempt/index.html?hpt=hp_t1") soup = BeautifulSoup(page) print soup.prettify() I think this is kind of straightforward. I open the webpage and pass it to the beautifulsoup. But here's what I got: Warning (from warnings module): File "C:\Python27\lib\site-packages\bs4\builder\_htmlparser.py", line 149 "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.")) ... HTMLParseError: bad end tag: u'</"+"script>', at line 634, column 94 I thought CNN website should be well designed, so I am not very sure what's going on though. Does anyone has idea about this?

Read the article

Developer IT

html5lib - Developer IT

Search Results

Search found 10 results on 1 pages for 'html5lib'.

Page 1/1 | 1

Which revision of html5lib is stable?

- by Mat

Read the article

Skip sanitization for videos in html5lib

- by pug

Read the article

Need to parse HTML document for links-- use a library like html5lib or something else?

- by Luinithil

Read the article

Parse html and find data in the html

- by Dan.StackOverflow

Read the article

Parsing with BeautifulSoup, error message TypeError: coercing to Unicode: need string or buffer, NoneType found

- by Samsun Knight

Read the article

Not able to show video with html5

- by shin

Read the article

Need a host which supports OSQA

- by Josip Gòdly Zirdum

Read the article

BeautifulSoup HTMLParseError. What's wrong with this?

- by user1915496

Read the article

Best library to parse HTML with Python 3 and example?

- by TMC

Read the article

BeautifulSoup can't parse a webpage?

- by JLTChiu

Read the article

1