Search Results

Search found 45245 results on 1810 pages for 'html content extraction'.

Page 3/1810 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • How to parse invalid HTML with Perl?

    - by bodacydo
    I maintain a database of articles with HTML formatting. Unfortunately the editors who wrote articles didn't know proper HTML, so they often have written stuff like: <div class="highlight"><html><head></head><body><p>Note that ...</p></html></div> I tried using HTML::TreeBuilder to parse this HTML but after parsing it and dumping the resulting tree, all the elements between <div class="highlight">...</div> are gone. I'm left with just <div class="highlight"></div>. The editors often have also done things like: <div class="article"><style>@font-face { font-family: "Cambria"; }</style>Article starts here</div> Parsing this with HTML::TreeBuilder results in empty <div class="article"></div> again. Any ideas how to approach this broken HTML and actually make sense out of it?

    Read the article

  • Copy HTML code but without javascript changes [closed]

    - by PaulP
    In Firebug there is very useful "Copy HTML" option in HTML Tab. But that copied HTML code also includes javascript changes like for example added new classes on document.ready (jQuery) event. I would like to copy raw HTML code like in "View source" option (it is every browser) without and javascript changes. Yes, I can use "View source" option but code in there is very scattered and it is very hard to copy one big HTML node not losing closing tag and in firebug with fold blessing I can match folded HTML node, right click and select "Copy HTML".

    Read the article

  • http-equiv=content-language alternative - the way of specifying document language

    - by tugberk
    Lots of web sites uses following meta tag to specify the default language of the document: <meta http-equiv="content-language" content="es-ES"> When I go to w3c site: http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html#meta.http-equiv.content-language I get this: Using the meta element to specify the document-wide default language is obsolete. Consider specifying the language on the root element instead. What is the way of specifying document language now?

    Read the article

  • Squid proxy not serving modified html content

    - by Matthew
    I'm trying to use squid to modify the page content of web page requests. I followed the upside-down-ternet tutorial which showed instructions for how to flip images on pages. I need to change the actual html of the page. I've been trying to do the same thing as in the tutorial, but instead of editing the image I'm trying to edit the html page. Below is a php script I'm using to try to do it. All jpg images get flipped, but the content on the page does not get edited. The edited index.html files written contain the edited content, but the pages the users receive don't contain the edited content. #!/usr/bin/php <?php $temp = array(); while ( $input = fgets(STDIN) ) { $micro_time = microtime(); // Split the output (space delimited) from squid into an array. $temp = split(' ', $input); //Flip jpg images, this works correctly if (preg_match("/.*\.jpg/i", $temp[0])) { system("/usr/bin/wget -q -O /var/www/cache/$micro_time.jpg ". $temp[0]); system("/usr/bin/mogrify -flip /var/www/cache/$micro_time.jpg"); echo "http://127.0.0.1/cache/$micro_time.jpg\n"; } //Don't edit files that are obviously not html. $temp[0] contains url of file to get elseif (preg_match("/(jpg|png|gif|css|js|\(|\))/i", $temp[0], $matches)) { echo $input; } //Otherwise, could be html (e.g. `wget http://www.google.com` downloads index.html) else{ $time = time() . microtime(); //For unique directory names $time = preg_replace("/ /", "", $time); //Simplify things by removing the spaces mkdir("/var/www/cache/". $time); //Create unique folder system("/usr/bin/wget -q --directory-prefix=\"/var/www/cache/$time/\" ". $temp[0]); $filename = system("ls /var/www/cache/$time/"); //Get filename of downloaded file //File is html, edit the content (this does not work) if(preg_match("/.*\.html/", $filename)){ //Get the html file contents $contentfh = fopen("/var/www/cache/$time/". $filename, 'r'); $content = fread($contentfh, filesize("/var/www/cache/$time/". $filename)); fclose($contentfh); //Edit the html file contents $content = preg_replace("/<\/body>/i", "<!-- content served by proxy --></body>", $content); //Write the edited file $contentfh = fopen("/var/www/cache/$time/". $filename, 'w'); fwrite($contentfh, $content); fclose($contentfh); //Return the edited page echo "http://127.0.0.1/cache/$time/$filename\n"; } //Otherwise file is not html, don't edit else{ echo $input; } } } ?>

    Read the article

  • Can the .htaccess file slow down a website to a crawl? If so, are there better ways to solve these problems with different rewrite rules and such?

    - by Parimal
    here is my htaccess file...... RewriteCond %{REQUEST_URI} ^/patients/billing/FAQ_billing\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/billing/getintouch\.html$ RewriteRule ^patients/billing/(.*)\.html$ $1.php [L,NC] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/a\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/b\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/c\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/d\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/e\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/f\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/g\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/h\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/i\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/j\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/k\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/l\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/m\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/n\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/o\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/p\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/q\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/r\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/s\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/t\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/u\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/v\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/w\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/x\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/y\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/findadoctor/z\.html$ RewriteRule ^patients/findadoctor/(.*)\.html$ findadoctor.php?id=$1 [L,NC] like that there is lots of rules around 250 line please help me...

    Read the article

  • Daily Blog Archives and Duplicate Content

    - by nemmy
    A few weeks back I realised that my blog software was creating daily post archives. Which basically resulted in duplicate content especially if I only had one post a day. The situation is something like this: www.sitename.com/blog/archives/2013/06/01 - daily archive for 1 June 2013 www.sitename.com/blog/archives/2013/06/my-post-name.html So, here we have two pages that are basically identical except the daily archive has some meaningless title like "Daily Archive for 1 June 2003". And I have no control over which content Google decides is the primary content. It's quite possible (and likely) that the daily archive could be the "primary" content and the actual post itself the "duplicate". Once I realised it was doing this I modified the daily archive template to include <meta name="robots" content="noindex"> Here we are a few weeks later and I still see some daily archives coming up in Google search results. I realise some of those deep pages might not be crawled yet but I am worried that the original post (which should be the PRIMARY content) has been marked duplicate content by Google. Now I've no indexed the daily archives I might end up with no indexed content AND the original articles still flagged as duplicates. And nothing will show up in search at all. Have I screwed myself here or is there a way out?

    Read the article

  • How do you parse an HTML in vb.net

    - by tooleb
    I would like to know if there is a simple way to parse HTML in vb.net. I know that HTML is not sctrict subset of XML, but it would be nice if it could be treated that way. Is there anything out there that would let me parse HTML in an XML-like way in VB.net?

    Read the article

  • How do I extract HTML content using Regex in PHP

    - by gAMBOOKa
    I know, i know... regex is not the best way to extract HTML text. But I need to extract article text from a lot of pages, I can store regexes in the database for each website. I'm not sure how XML parsers would work with multiple websites. You'd need a separate function for each website. In any case, I don't know much about regexes, so bear with me. I've got an HTML page in a format similar to this <html> <head>...</head> <body> <div class=nav>...</div><p id="someshit" /> <div class=body>....</div> <div class=footer>...</div> </body> I need to extract the contents of the body class container. I tried this. $pattern = "/<div class=\"body\">\(.*?\)<\/div>/sui" $text = $htmlPageAsIs; if (preg_match($pattern, $text, $matches)) echo "MATCHED!"; else echo "Sorry gambooka, but your text is in another castle."; What am I doing wrong? My text ends up in another castle.

    Read the article

  • Parse html and find data in the html

    - by Dan.StackOverflow
    Hi all. I am trying to use html5lib to parse an html page in to something I can query with xpath. html5lib has close to zero documentation and I've spent too much time trying to figure this problem out. Ultimate goal is to pull out the second row of a table: <html> <table> <tr><td>Header</td></tr> <tr><td>Want This</td></tr> </table> </html> so lets try it: >>> doc = html5lib.parse('<html><table><tr><td>Header</td></tr><tr><td>Want This</td> </tr></table></html>', treebuilder='lxml') >>> doc <lxml.etree._ElementTree object at 0x1a1c290> that looks good, lets see what else we have: >>> root = doc.getroot() >>> print(lxml.etree.tostring(root)) <html:html xmlns:html="http://www.w3.org/1999/xhtml"><html:head/><html:body><html:table><html:tbody><html:tr><html:td>Header</html:td></html:tr><html:tr><html:td>Want This</html:td></html:tr></html:tbody></html:table></html:body></html:html> LOL WUT? seriously. I was planning on using some xpath to get at the data I want, but that doesn't seem to work. So what can I do? I am willing to try different libraries and approaches.

    Read the article

  • What was missing from the Content Strategy Forum?

    - by Roger Hart
    In April, Paris hosted the first ever Content Strategy Forum. The event's website proudly proclaims: 170 attendees, 18 nationalities, 17 speakers, 1 volcano... Content Strategy Forum 2010 rocked the world! The volcano was in Iceland, and the closest we came to rocking the world was a cursory mention in the Huffington Post, but I'll grant the event was awesome. One thing missing from that list, however, is "94 companies" (Plus a couple of universities and freelancers, and what have you). A glance through the attendees directory reveals a fairly wide organisational turnout - 24 students from two Parisian universities, countless design and marketing agencies, a series of tech firms, small and large. Two delegates from IBM, two from ARM, an appearance from RIM, Skype, and Facebook; twelve from the various bits of eBay. Oh, and, err, nobody from Google, Microsoft, Yahoo, Amazon, Play, Twitter, LinkedIn, Craigslist, the BBC, no banks I noticed, and I didn't spot a newspaper. You get the idea. Facebook notwithstanding, you have to scroll through a few pages to Alexa rankings to find company names from the attendee list. I find this interesting, and I'm not wholly sure what to make of it. Of the large, web-centric, content-rich organizations conspicuously absent, at least one of two things is true: They didn't know about the event They didn't care about the event Maybe these guys all have content strategy completely sorted, and it's an utterly naturalised part of their business process. Maybe nobody at say, Apple or Play.com ever publishes a single piece of content that isn't neatly tailored to their (clearly defined, of course) user and business goals. Wouldn't that be lovely? The thing is, in that rosy and beatific world, there's still a case for those folks to join the community. There are bound to be other perspectives, and things to learn. You see, the other thing achingly conspicuous by its absence was case studies. In her keynote address, Kristina Halvorson made the point that what content strategy really needs is some big, loud success stories. A point I'd firmly second as a content strategist working within an organisation. Sarah Cancilla's presentation on content strategy at Facebook included some very neat, specific examples, and was richer for it. It didn't hurt that the example was Facebook - you're getting impressively big numbers off base. What about the other big boys? Is there anybody out there with a perspective? Do we all just look very silly to you, fretting away over text and images and users and purposes? Is content validation and maintenance so accustomed a part of your business that calling attention to it is like sniffing the air and saying "Hmm, a lot of nitrogen about today."? And if it is, do you have any wisdom to share?

    Read the article

  • Managing JS and CSS for a static HTML web application

    - by Josh Kelley
    I'm working on a smallish web application that uses a little bit of static HTML and relies on JavaScript to load the application data as JSON and dynamically create the web page elements from that. First question: Is this a fundamentally bad idea? I'm unclear on how many web sites and web applications completely dispense with server-side generation of HTML. (There are obvious disadvantages of JS-only web apps in the areas of graceful degradation / progressive enhancement and being search engine friendly, but I don't believe that these are an issue for this particular app.) Second question: What's the best way to manage the static HTML, JS, and CSS? For my "development build," I'd like non-minified third-party code, multiple JS and CSS files for easier organization, etc. For the "release build," everything should be minified, concatenated together, etc. If I was doing server-side generation of HTML, it'd be easy to have my web framework generate different development versus release HTML that includes multiple verbose versus concatenated minified code. But given that I'm only doing any static HTML, what's the best way to manage this? (I realize I could hack something together with ERB or Perl, but I'm wondering if there are any standard solutions.) In particular, since I'm not doing any server-side HTML generation, is there an easy, semi-standard way of setting up my static HTML so that it contains code like <script src="js/vendors/jquery.js"></script> <script src="js/class_a.js"></script> <script src="js/class_b.js"></script> <script src="js/main.js"></script> at development time and <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js"></script> <script src="js/entire_app.min.js"></script> for release?

    Read the article

  • What is 'lack of original content'?

    - by JVerstry
    It is written everywhere that lack of original content is has a negative impact on ranking. But what is lack of original content? (I am not talking about duplicate content) I guess if you copy other site's content, this makes sense. But, assuming one develops its own functionalities, but similar functionalities are already available on other sites, is this considered lack of original content? Can Google decide to not index such pages (i.e., not give them a chance at all)? Are there other definition of 'lack of original content'?

    Read the article

  • Is content slowing down your business?

    - by Lance Shaw
    We are living in a digital world, however paper is everywhere and expensive, right? We all agree content is an important part of our organization and contribute to its decision making. However many of us see dealing with this as a challenge and the growth of content is impacting our ability to scale and respond quickly to our customers. Business always has been content intensive. For JD Edwards customers, this is an important consideration.  After all, the processes being run in JD Edwards are usually very critical to the success of your business and if they are not running as smoothly as they should due to manual process steps involving paper or searching for content, you should look into improving them.  To that end, we hope you will join this webinar and learn how Oracle and KPIT | SYSTIME have partnered to help a JD Edwards customer content-enable its enterprise with Oracle WebCenter Content and Oracle WebCenter Imaging 11g and integrate them back with JD Edwards to significantly improve processing speed and operational costs.

    Read the article

  • Combine two content encodings sections in a single page

    - by AmirGl
    I developed a web application that allows users to modify existing web pages. When a user type a url of an existing web page, I read the content of this page and using an ajax call, i display the content in a div inside my web application. Now my problem is that often the content encoding of the existing web page is different than my web app (I use utf-8) Is there a way to load content using an ajax call with different content encoding than the one of the main page? Thanks, Amir

    Read the article

  • parsing simple html for iphone

    - by sitara
    I have a very simple html page to parse. The html page will remain simple always. as simple as this <html> <head><title>title</title></head> <body>some data here</body> </html> I have fetched the html content of such an html page and have it in an NSString. I want to get what ever data is there in the body of the html page. Please tell me how can this be done and let me know if there are more than one possible ways. I would prefer doing it using basic obj-c if it is possible. Thanks

    Read the article

  • how to cout a vector of structs (that's a class member, using extraction operator)

    - by Julz
    hi, i'm trying to simply cout the elements of a vector using an overloaded extraction operator. the vector contians Point, which is just a struct containing two doubles. the vector is a private member of a class called Polygon, so heres my Point.h #ifndef POINT_H #define POINT_H #include <iostream> #include <string> #include <sstream> struct Point { double x; double y; //constructor Point() { x = 0.0; y = 0.0; } friend std::istream& operator >>(std::istream& stream, Point &p) { stream >> std::ws; stream >> p.x; stream >> p.y; return stream; } friend std::ostream& operator << (std::ostream& stream, Point &p) { stream << p.x << p.y; return stream; } }; #endif my Polygon.h #ifndef POLYGON_H #define POLYGON_H #include "Segment.h" #include <vector> class Polygon { //insertion operator needs work friend std::istream & operator >> (std::istream &inStream, Polygon &vertStr); // extraction operator friend std::ostream & operator << (std::ostream &outStream, const Polygon &vertStr); public: //Constructor Polygon(const std::vector<Point> &theVerts); //Default Constructor Polygon(); //Copy Constructor Polygon(const Polygon &polyCopy); //Accessor/Modifier methods inline std::vector<Point> getVector() const {return vertices;} //Return number of Vector elements inline int sizeOfVect() const {return vertices.size();} //add Point elements to vector inline void setVertices(const Point &theVerts){vertices.push_back (theVerts);} private: std::vector<Point> vertices; }; and Polygon.cc using namespace std; #include "Polygon.h" // Constructor Polygon::Polygon(const vector<Point> &theVerts) { vertices = theVerts; } //Default Constructor Polygon::Polygon(){} istream & operator >> (istream &inStream, Polygon::Polygon &vertStr) { inStream >> ws; inStream >> vertStr; return inStream; } // extraction operator ostream & operator << (ostream &outStream, const Polygon::Polygon &vertStr) { outStream << vertStr.vertices << endl; return outStream; } i figure my Point insertion/extraction is right, i can insert and cout using it and i figure i should be able to just...... cout << myPoly[i] << endl; in my driver? (in a loop) or even... cout << myPoly[0] << endl; without a loop? i've tried all sorts of myPoly.at[i]; myPoly.vertices[i]; etc etc also tried all veriations in my extraction function outStream << vertStr.vertices[i] << endl; within loops, etc etc. when i just create a... vector<Point> myVect; in my driver i can just... cout << myVect.at(i) << endl; no problems. tried to find an answer for days, really lost and not through lack of trying!!! thanks in advance for any help. please excuse my lack of comments and formatting also there's bits and pieces missing but i really just need an answer to this problem thanks again

    Read the article

  • Ajax Control Toolkit July 2011 Release and the New HTML Editor Extender

    - by Stephen Walther
    I’m happy to announce the July 2011 release of the Ajax Control Toolkit which includes important bug fixes and a completely new HTML Editor Extender control. You can download the July 2011 Release by visiting the Ajax Control Toolkit CodePlex site at: http://AjaxControlToolkit.CodePlex.com Using the New HTML Editor Extender Control You can use the new HTML Editor Extender to extend any standard ASP.NET TextBox control so that it supports rich formatting such as bold, italics, bulleted lists, numbered lists, typefaces and different foreground and background colors. The following code illustrates how you can extend a standard ASP.NET TextBox control with the HtmlEditorExtender: <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="Simple.aspx.cs" Inherits="WebApplication1.Simple" %> <%@ Register TagPrefix="asp" Namespace="AjaxControlToolkit" Assembly="AjaxControlToolkit" %> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title>Simple</title> </head> <body> <form id="form1" runat="server"> <asp:ToolkitScriptManager runat="Server" /> <asp:TextBox ID="txtComments" TextMode="MultiLine" Columns="60" Rows="8" runat="server" /> <asp:HtmlEditorExtender TargetControlID="txtComments" runat="server" /> </form> </body> </html> This page has the following three controls: ToolkitScriptManager – The ToolkitScriptManager renders all of the scripts required by the Ajax Control Toolkit. TextBox – The TextBox control is a standard ASP.NET TextBox which is set to display multiple lines (a TextArea instead of an Input element). HtmlEditorExtender – The HtmlEditorExtender is set to extend the TextBox control. You can use the standard TextBox Text property to read the rich text entered into the TextBox control on the server. Lightweight and HTML5 The HTML Editor Extender works on all modern browsers including the most recent versions of Mozilla Firefox (Firefox 5), Google Chrome (Chrome 12), and Apple Safari (Safari 5). Furthermore, the HTML Editor Extender is compatible with Microsoft Internet Explorer 6 and newer. The HTML Editor Extender is very lightweight. It takes advantage of the HTML5 ContentEditable attribute so it does not require an iframe or complex browser workarounds. If you select View Source in your browser while using the HTML Editor Extender, we hope that you will be pleasantly surprised by how little markup and script is generated by the HTML Editor Extender. Customizable Toolbar Buttons Depending on the web application that you are building, you will want to display different toolbar buttons with the HTML Editor Extender. One of the design goals of the HTML Editor Extender was to make it very easy for you to customize the toolbar buttons. Imagine, for example, that you want to use the HTML Editor Extender when accepting comments on blog posts. In that case, you might want to restrict the type of formatting that a user can display. You might want to enable a user to format text as bold or italic but you do not want the user to make any other formatting changes. The following page illustrates how you can customize the HTML Editor Extender toolbar: <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="CustomToolbar.aspx.cs" Inherits="WebApplication1.CustomToolbar" %> <%@ Register TagPrefix="asp" Namespace="AjaxControlToolkit" Assembly="AjaxControlToolkit" %> <html> <head runat="server"> <title>Custom Toolbar</title> </head> <body> <form id="form1" runat="server"> <asp:ToolkitScriptManager Runat="server" /> <asp:TextBox ID="txtComments" TextMode="MultiLine" Columns="50" Rows="10" Text="Hello <b>world!</b>" Runat="server" /> <asp:HtmlEditorExtender TargetControlID="txtComments" runat="server"> <Toolbar> <asp:Bold /> <asp:Italic /> </Toolbar> </asp:HtmlEditorExtender> </form> </body> </html> Notice that the HTML Editor Extender in the page above has a Toolbar subtag. You can list the toolbar buttons which you want to appear within the subtag. In the case above, only Bold and Italic buttons are displayed. Here is a complete list of the Toolbar buttons currently supported by the HTML Editor Extender: Undo Redo Bold Italic Underline StrikeThrough Subscript Superscript JustifyLeft JustifyCenter JustifyRight JustifyFull InsertOrderedList InsertUnorderedList CreateLink UnLink RemoveFormat SelectAll UnSelect Delete Cut Copy Paste BackgroundColorSelector ForeColorSelector FontNameSelector FontSizeSelector Indent Outdent InsertHorizontalRule HorizontalSeparator Of course the HTML Editor Extender was designed to be extensible. You can create your own buttons and add them to the control. Compatible with the AntiXSS Library When using the HTML Editor Extender on a public facing website, we strongly recommend that you use the HTML Editor Extender with the AntiXSS Library. If you allow users to submit arbitrary HTML, and you don’t take any action to strip out malicious markup, then you are opening your website to Cross-Site Scripting Attacks (XSS attacks). The HTML Editor Extender uses the Provider Model to support different Sanitizer Providers. The July 2011 release of the Ajax Control Toolkit ships with a single Sanitizer Provider which uses the AntiXSS library (see http://AntiXss.CodePlex.com ). A Sanitizer Provider is responsible for sanitizing HTML markup by removing any malicious elements, attributes, and attribute values. For example, the AntiXss Sanitizer Provider will take the following block of HTML: <b><a href=""javascript:doEvil()"">Visit Grandma</a></b> <script>doEvil()</script> And return the following sanitized block of HTML: <b><a href="">Visit Grandma</a></b> Notice that the JavaScript href and <SCRIPT> tag are both stripped out. Be aware that there are a depressingly large number of ways to sneak evil markup into your HTML. You definitely want a Sanitizer as a safety net. Before you can use the AntiXSS Sanitizer Provider, you must add three assemblies to your web application: AntiXSSLibrary.dll, HtmlSanitizationLibrary.dll, and SanitizerProviders.dll. All three assemblies are included with the CodePlex download of the Ajax Control Toolkit in the SanitizerProviders folder. Here’s how you modify your web.config file to use the AntiXSS Sanitizer Provider: <configuration> <configSections> <sectionGroup name="system.web"> <section name="sanitizer" requirePermission="false" type="AjaxControlToolkit.Sanitizer.ProviderSanitizerSection, AjaxControlToolkit"/> </sectionGroup> </configSections> <system.web> <compilation targetFramework="4.0" debug="true"/> <sanitizer defaultProvider="AntiXssSanitizerProvider"> <providers> <add name="AntiXssSanitizerProvider" type="AjaxControlToolkit.Sanitizer.AntiXssSanitizerProvider"></add> </providers> </sanitizer> </system.web> </configuration> You can detect whether the HTML Editor Extender is using the AntiXSS Sanitizer Provider by checking the HtmlEditorExtender SanitizerProvider property like this: if (MyHtmlEditorExtender.SanitizerProvider == null) { throw new Exception("Please enable the AntiXss Sanitizer!"); } When the SanitizerProvider property has the value null, you know that a Sanitizer Provider has not been configured in the web.config file. Because the AntiXSS library requires Full Trust, you cannot use the AntiXSS Sanitizer Provider with most shared website hosting providers. Because most shared hosting providers only support Medium Trust and not Full Trust, we do not recommend using the HTML Editor Extender with a public website hosted with a shared hosting provider. Why a New HTML Editor Control? The Ajax Control Toolkit now includes two HTML Editor controls. Why did we introduce a new HTML Editor control when there was already an existing HTML Editor? We think you will like the new HTML Editor much more than the previous one. We had several goals with the new HTML Editor Extender: Lightweight – We wanted to leverage HTML5 to create a lightweight HTML Editor. The new HTML Editor generates much less markup and script than the previous HTML Editor. Secure – We wanted to make it easy to integrate the AntiXSS library with the HTML Editor. If you are creating a public facing website, we strongly recommend that you use the AntiXSS Provider. Customizable – We wanted to make it easy for users to customize the toolbar buttons displayed by the HTML Editor. Compatibility – We wanted to ensure that the HTML Editor will work with the latest versions of the most popular browsers (including Internet Explorer 6 and higher). The old HTML Editor control is still included in the Ajax Control Toolkit and continues to live in the AjaxControlToolkit.HTMLEditor namespace. We have not modified the control and you can continue to use the control in the same way as you have used it in the past. However, we hope that you will consider migrating to the new HTML Editor Extender for the reasons listed above. Summary We’ve introduced a new Ajax Control Toolkit control with this release. I want to thank the developers and testers on the Superexpert team for the huge amount of work which they put into this control. It was a non-trivial task to build an entirely new control which has the complexity of the HTML Editor in less than 6 weeks. Please let us know what you think! We want to hear your feedback. If you discover issues with the new HTML Editor Extender control, or you have questions about the control, or you have ideas for how it can be improved, then please post them to this blog. Tomorrow starts a new sprint

    Read the article

  • Make your CHM Help Files show HTML5 and CSS3 content

    - by Rick Strahl
    The HTML Help 1.0 specification aka CHM files, is pretty old. In fact, it's practically ancient as it was introduced in 1997 when Internet Explorer 4 was introduced. Html Help 1.0 is basically a completely HTML based Help system that uses a Help Viewer that internally uses Internet Explorer to render the HTML Help content. Because of its use of the Internet Explorer shell for rendering there were many security issues in the past, which resulted in locking down of the Web Browser control in Windows and also the Help Engine which caused some unfortunate side effects. Even so, CHM continues to be a popular help format because it is very easy to produce content for it, using plain HTML and because it works with many Windows application platforms out of the box. While there have been various attempts to replace CHM help files CHM files still seem to be a popular choice for many applications to display their help systems. The biggest alternative these days is no system based help at all, but links to online documentation. For Windows apps though it's still very common to see CHM help files and there are still a ton of CHM help out there and lots of tools (including our own West Wind Html Help Builder) that produce output for CHM files as well as Web output. Image is Everything and you ain't got it! One problem with the CHM engine is that it's stuck with an ancient Internet Explorer version for rendering. For example if you have help content that uses HTML5 or CSS3 content you might have an HTML Help topic like the following shown here in a full Web Browser instance of Internet Explorer: The page clearly uses some CSS3 features like rounded corners and box shadows that are rendered using plain CSS 3 features. Note that I used Internet Explorer on purpose here to demonstrate that IE9 on Windows 7 can properly render this content using some of the new features of CSS, but the same is true for all other recent versions of the major browsers (FireFox 3.1+, Safari 4.5+, WebKit 9+ etc.). Unfortunately if you take this nice and simple CSS3 content and run it through the HTML Help compiler to produce a CHM file the resulting output on the same machine looks a bit less flashy: All the CSS3 styling is gone and although the page display and functionality still works, but all the extra styling features are gone. This even though I am running this on a Windows 7 machine that has IE9 that should be able to render these CSS features. Bummer. Web Browser Control - perpetually stuck in IE 7 Mode The problem is the Web Browser/Shell Components in Windows. This component is and has been part of Windows for as long as Internet Explorer has been around, but the Web Browser control hasn't kept up with the latest versions of IE. In a nutshell the control is stuck in IE7 rendering mode for engine compatibility reasons by default. However, there is at least one way to fix this explicitly using Registry keys on a per application basis. The key point from that blog article is that you can override the IE rendering engine for a particular executable by setting one (or more) registry flags that tell the Windows Shell which version of the Internet Explorer rendering engine to load. An application that wishes to use a more recent version of Internet Explorer can then register itself during installation for the specific IE version desired and from then on the application will use that version of the Web Browser component. If the application is older than the specified version it falls back to the default version (IE 7 rendering). Forcing CHM files to display with IE9 (or later) Rendering Knowing that we can force the IE usage for a given process it's also possible to affect the CHM rendering by setting same keys on the executable that's hosting the CHM file. What that executable file is depends on the type of application as there are a number of ways that can launch the help engine. hh.exeThe standalone Windows CHM Help Viewer that launches when you launch a CHM from Windows Explorer. You can manually add hh.exe to the registry keys. YourApplication.exeIf you're using .NET or any tool that internally uses the hhControl ActiveX control to launch help content your application is your host. You should add your application's exe to the registry during application startup. foxhhelp9.exeIf you're building a FoxPro application that uses the built-in help features, foxhhelp9.exe is used to actually host the help controls. Make sure to add this executable to the registry. What to set You can configure the Internet Explorer version used for an application in the registry by specifying the executable file name and a value that specifies the IE version desired. There are two different sets of keys for 32 bit and 64 bit applications. 32 bit only or 64 bit: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION Value Key: hh.exe 32 bit on 64 bit machine: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION Value Key: hh.exe Note that it's best to always set both values ideally when you install your application so it works regardless of which platform you run on. The value specified is a DWORD value and the interesting values are decimal 9000 for IE9 rendering mode depending on !DOCTYPE settings or 9999 for IE 9 standards mode always. You can use the same logic for 8000 and 8888 for IE8 and the final value of 7000 for IE7 (one has to wonder what they're going todo for version 10 to perpetuate that pattern). I think 9000 is the value you'd most likely want to use. 9000 means that IE9 will be used for rendering but unless the right doctypes are used (XHTML and HTML5 specifically) IE will still fall back into quirks mode as needed. This should allow existing pages to continue to use the fallback engine while new pages that have the proper HTML doctype set can take advantage of the newest features. Here's an example of how I set the registry keys in my Tarma Installmate registry configuration: Note that I set all three values both under the Software and Wow6432Node keys so that this works regardless of where these EXEs are launched from. Even though all apps are 32 bit apps, the 64 bit (the default one shown selected) key is often used. So, now once I've set the registry key for hh.exe I can now launch my CHM help file from Explorer and see the following CSS3 IE9 rendered display: Summary It sucks that we have to go through all these hoops to get what should be natural behavior for an application to support the latest features available on a system. But it shouldn't be a surprise - the Windows Help team (if there even is such a thing) has not been known for forward looking technologies. It's a pretty big hassle that we have to resort to setting registry keys in order to get the Web Browser control and the internal CHM engine to render itself properly but at least it's possible to make it work after all. Using this technique it's possible to ship an application with a help file and allow your CHM help to display with richer CSS markup and correct rendering using the stricter and more consistent XHTML or HTML5 doctypes. If you provide both Web help and in-application help (and why not if you're building from a single source) you now can side step the issue of your customers asking: Why does my help file look so much shittier than the online help… No more!© Rick Strahl, West Wind Technologies, 2005-2012Posted in HTML5  Help  Html Help Builder  Internet Explorer  Windows   Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

    Read the article

  • Value of the HTML5 lang attribute

    - by user359650
    I'm working on a website which will offer localized content following the language+region approach as described on this W3.org page (e.g. fr-CA for Canadian French content, and fr-FR for "French French" content). As we consider content for each language+region to be unique, it is crucial to us that search engines properly identify and serve the content accordingly. By looking up on the Internet (e.g. this question), it appears that most people recommend the use of an ISO639 language code in the HTML lang attribute to describe the content language. Following this recommendation, we would en up using <html lang="fr"> which wouldn't enable the differentiation between the aforementioned language+region combinations. When reviewing the HTML4 specification, it seems that using language+region as a language code would be perfectly OK, as the en-US example is given as one possible value. However I couldn't find any confirmation of this in the HTML5 specification which doesn't seem to provide any example as to the possible allowed values. From there I tried to get a de facto answer by looking at what the web giants are doing. I looked at what Facebook are doing: they offer Candian French and French French versions of their websites with (slightly) different content, whilst the HTML lang value remains the same: fr-CA URL: http://fr-ca.facebook.com HTML lang attribute: <html lang="fr"> translation of the word 'email': courriel fr-FR URL: http://fr-fr.facebook.com/ HTML lang attribute: <html lang="fr"> translation of the word 'email': Adresse électronique Q: What is the recommended/standard way of describing content that was localized using the language+region approach in HTML5 ?

    Read the article

  • 301 redirect from "/index.html" to root if index.html not exist

    - by Andrij Muzychka
    Can I create 301 redirect from "index.html" to root directory if file "index.html" not exist? For example: link "http://example.com/index.html" show "404 Error" page. I need 301 redirect to root directory: "http://example.com/" in .htaccess I add rule: Options +FollowSymLinks RewriteCond %{THE_REQUEST} ^.*/index.html RewriteRule ^(.*)index.html$ http://example.com/$1 [R=301,L] but it doesn't work. Can you help me solve this problem?

    Read the article

  • parsing HTML on the iPhone

    - by Ben Alpert
    Can anyone recommend a C or Objective-C library for HTML parsing? It needs to handle messy HTML code that won't quite validate. Does such a library exist, or am I better off just trying to use regular expressions?

    Read the article

  • How can I inform search engines that the usefulness of some content on my site has a limited shelf life?

    - by Tim Post
    Let's say that I run a forum dedicated to computer hardware. Naturally, people are going to ask questions like: What is the best laptop for running [os] Or What is the best video card for under [amount] These may be perfectly fine discussions, but the content loses usefulness over time. An answer to either question asked in 2007 might still be relevant in 2008, but definitely not in 2012. Is there a way that I can tell search engines that certain pages might not give visitors what they're looking for after a certain date, and perhaps hint to a page on my site that would provide good information? Perhaps something I could set in HTTP response headers, meta tags or even a site map?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >