What are the common techniques to handle user-generated HTML modified differently by different browsers?
        Posted  
        
            by 
                Jakie
            
        on Programmers
        
        See other posts from Programmers
        
            or by Jakie
        
        
        
        Published on 2011-10-07T01:41:00Z
        Indexed on 
            2011/11/13
            2:07 UTC
        
        
        Read the original article
        Hit count: 286
        
I am developing a website updater. The front end uses HTML, CSS and JavaScript, and the backend uses Python.
The way it works is that <p/>, <b/> and some other HTML elements can be updated by the user. To enable this, I load the webpage and, with JQuery, convert all those elements to <textarea/> elements. Once they the content of the text area is changed, I apply the change to the original elements and send it to a Python script to store the new content.
The problem is that I'm finding that different browsers change the original HTML.
- How do you get around this issue?
 - What Python libraries do you use?
 - What techniques or application designs do you use to avoid or overcome this issue?
 
The problems I found are:
- IE removes the quotes around 
classandidattributes. For example,<img class='abc'/>becomes<img class=abc/>. - Firefox removes the backslash from the line breaks: 
<br \>becomes<br>. - Some websites have very specific display technicalities, so an insertion of a simple "\n"(which IE does) can affect the display of a website. Example: changing 
<img class='headingpic' /><div id="maincontent">to<img class='headingpic'/>\n <div id="maincontent">inserts a vertical gap in IE. 
The things I have unsuccessfully tried to overcome these issues:
- Using either JQuery or Python to remove all 
>\n<occurences,<br>etc. But this fails because I get different patterns in IE, sometimes a·\n, sometimes a\n···. - In a Python, parse the new HTML, extract the new text/content, insert it into the old HTML so the elements and format never change, just the content. This is very difficult and seems to be overkill.
 
© Programmers or respective owner