Search Results

Search found 19554 results on 783 pages for 'xml pull parser'.

Page 432/783 | < Previous Page | 428 429 430 431 432 433 434 435 436 437 438 439  | Next Page >

  • Settings up a Mercurial server on IIS 6

    - by TheCodeJunkie
    Hi, I've set up a Mercurial server on a Windows 2003 / IIS 6 machine and when I try to pull the repository I get the following sequence requesting all changes adding changesets adding manifests adding file changes transaction abort! rollback completed abort: premature EOF reading chunk (got 91303 bytes, expected 1542634) I've tried pretty much everything I can think of, but with no success. I followed the steps of Jeremy Skinners guide on doing it for IIS7, but on an IIS6 server. I found a post where the author was experiencing the same issue, but was unable to find a solution. So far it looks like the solution is to migrate to Apache or upgrade to Windows 2008/II7 .. but if someone knows how to solve this, please let me know

    Read the article

  • WCF SSL secure transfer or large payloads without changing firewall.

    - by Sir Mix
    I need to transfer small amounts of data intermittently from clients to our server in a secure fashion and pull down large binary files from the server ocassionally. It's important for all this to be reliable. I'm anticipating 100,000 clients. I control both ends, but I want to deliver a solution that doesn't require changing the firewall for the majority of customers. A lag of one or two minutes before the information migrates to the server or comes down seems to be acceptable at this time. We need to make the connection secure, so was thinking about SSL, but open to suggestions. Basically, what is the best binding to use in this situation so that we have a secure transmission and the system handles the stress and load in a way that works for 95% of clients out of the box (firewalls will not block in majority of firewall configurations).

    Read the article

  • Get the selected drop down list value from a FormCollection in MVC

    - by James Santiago
    I have a form posting to an action with MVC. I want to pull the selected drop down list item from the FormCollection in the action. How do I do it? My Html form: <% using (Html.BeginForm()) {%> <select name="Content List"> <% foreach (String name in (ViewData["names"] as IQueryable<String>)) { %> <option value="<%= name %>"><%= name%></option> <% } %> </select> <p><input type="submit" value="Save" /></p> <% } %> My Action: [HttpPost] public ActionResult Index(FormCollection collection) { //how do I get the selected drop down list value? String name = collection.AllKeys.Single(); return RedirectToAction("Details", name); }

    Read the article

  • Dynamically evaluating simple boolean logic in Python

    - by a paid nerd
    I've got some dynamically-generated boolean logic expressions, like: (A or B) and (C or D) A or (A and B) A empty - evaluates to True The placeholders get replaced with booleans. Should I, Convert this information to a Python expression like True or (True or False) and eval it? Create a binary tree where a node is either a bool or Conjunction/Disjunction object and recursively evaluate it? Convert it into nested S-expressions and use a Lisp parser? Something else? Suggestions welcome.

    Read the article

  • display image in a grid using extjs

    - by Abisha
    I am new to extjs. I want to display icon images for each grid elements. can you please healp me anybody? i am getting the image path from an xml file. my code is below. here i am displaying image path. i have to replace it by displaying image. Ext.onReady(function(){ var store = new Ext.data.Store({ url: 'new_frm.xml', reader: new Ext.data.XmlReader({ record: 'message', fields: [{name: 'first'},{name: 'last'},{name: 'company'},{name: 'email'},{name: 'gender'},{name: 'form-file'},{name: 'state'},{name: 'Live'},{name: 'content'}] }) }); var grid = new Ext.grid.GridPanel({ store: store, columns: [ {header: "First Name", width: 120, dataIndex: 'first', sortable: true}, {header: "Last Name", width: 180, dataIndex: 'last', sortable: true}, {header: "Company", width: 115, dataIndex: 'company', sortable: true}, {header: "Email", width: 100, dataIndex: 'email', sortable: true}, {header: "Gender", width: 100, dataIndex: 'gender', sortable: true}, {header: "Photo", width: 100, dataIndex: 'form-file', sortable: true}, {header: "State", width: 100, dataIndex: 'state', sortable: true}, {header: "Living with", width: 100, dataIndex: 'Live', sortable: true}, {header: "Commands", width: 100, dataIndex: 'content', sortable: true} ], renderTo:'example-grid', height:200 }); store.load(); });

    Read the article

  • LIbrary issue: How do I set up QtWebKit to parse HTML?

    - by user560106
    Nick Presta showed that you can parse HTML with qt here: Library Recommendation: C++ HTML Parser However, when I attempt to build this, I get an access violation on the "QWebFrame* frame = page.mainFrame();" line. What am I doing wrong? #include <QtWebKit\QWebElement> #include <QtWebKit\QWebView> #include <QtWebKit\QWebFrame> #include <QtWebKit\QWebPage> #include <iostream> int main() { QWebPage page; QWebFrame* frame = page.mainFrame(); frame->setHtml( "<html><head></head><body></body></html>" ); QWebElement document = frame->documentElement(); return 0; }

    Read the article

  • perl regex groups

    - by Aaron Moodie
    I've currenly trying to pull out dates from a file and feed them directly into an array. My regex is working, but I have 6 groups in it, all of which are being added to the array, when I only want the first one. @dates = (@dates, ($line =~ /((0[1-9]|[12][0-9]|3[01])(\/|\-)(0[1-9]|1[0-2])(\/|\-)([0-9][0-9][0-9][0-9]|[0-9][0-9]))/g )); is there a simple way to grab the $1 group of a perl regex? my output is looking like this: 13/04/2009, 13, /, 04, /, 2009, 14-12-09, 14, -, 12, -, 09

    Read the article

  • GWT: uiBinder-based widget cant be instanced second time

    - by Konoplianko
    Hi. I created a widget using GWT uiBinder. It works fine, till the moment when I want to instance it second time. After i call constructor second time it returns only raw description from XML and statements in constructor (rootElement.add( new HTML( "panel1" ), leftId );) are just don't work. It throws no error or warning. Please help Java class: public class DashboardLayout extends Composite { final String leftId = "boxLeft"; final String rightId = "boxRight"; interface DashboardLayoutUiBinder extends UiBinder<HTMLPanel, DashboardLayout> { } private static DashboardLayoutUiBinder ourUiBinder = GWT.create( DashboardLayoutUiBinder.class ); @UiField HTMLPanel htmlPanel; public DashboardLayout() { HTMLPanel rootElement = ourUiBinder.createAndBindUi( this ); this.initWidget( rootElement ); rootElement.add( new HTML( "panel1" ), leftId ); rootElement.add( new HTML( "panel2" ), rightId ); } } XML descriprion: <ui:UiBinder xmlns:ui='urn:ui:com.google.gwt.uibinder' xmlns:g='urn:import:com.google.gwt.user.client.ui' > <g:HTMLPanel ui:field="htmlPanel"> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr> <td width="40%" id="boxLeft" class="boxContextLeft"> </td> <td width="60%" id="boxRight" class="boxContextRight"> </td> </tr> </table> </g:HTMLPanel> </ui:UiBinder>

    Read the article

  • Internet Explorer blocked this website from displaying content with security certificate errors

    - by Tabrez
    I have a security certificate linked to a CDN's server. The main website is https:www.connect4fitness.com When I pull the site up in firefox or chrome, everything works fine. But in IE I get the following error: "Internet Explorer blocked this website from displaying content with security certificate errors." On IE 9 it shows the button "Display Content" and you can get past the error by clicking on the button. On older versions on I the error message is much more cryptic and is confusing users. Please note that I don't have the option of asking end users to add the site to Trusted Sources as some folks use the site from their work computers and do not have that access. Also, some people don't bother to call once they hit the error. I have looked at the content and all my links are "https" only. I had one namespace link and I got rid of it. Any idea about how I can find what is triggering this message?

    Read the article

  • iPhone: Fastest way to create a binary Plist with simple key/value strings

    - by randombits
    What's the best way to create a binary plist on the iPhone with simple string based key/value pairs? I need to create a plist with a list of recipe and ingredients. I then want to be able to read this into an NSDictionary so I can do something like NSString *ingredients = [recipes objectForKey:@"apple pie"]; I'm reading in an XML data file through an HTTP request and want to parse all of the key value pairs into the plist. The XML might look something like: <recipes> <recipe> <name>apple pie</name> <ingredients>apples and pie</ingredients> </recipe> <recipe> <name>cereal</name> <ingredients>milk and some other ingredients</ingredients> </recipe> </recipes> Ideally, I'll be able to write this to a plist at runtime, and then be able to read it and turn it into an NSDictionary later at runtime as well.

    Read the article

  • MSBuild Validating Properties

    - by Brian Gillespie
    I'm working on a reusable MSBuild Target that will be consumed by several other tasks. This target requires that several properties be defined. What's the best way to validate that properties are defined, throwing an Error if the are not? Two attempts that I almost like: <?xml version="1.0" encoding="utf-8" ?> <Project ToolsVersion="3.5" DefaultTarget="Release" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <Target Name="Release"> <Error Text="Property PropA required" Condition="'$(PropA)' == ''"/> <Error Text="Property PropB required" Condition="'$(PropB)' == ''"/> <!-- The body of the task --> </Target> </Project> Here's an attempt at batching. It's ugly because of the extra "Name" parameter. Is it possible to use the Include attribute instead? <?xml version="1.0" encoding="utf-8" ?> <Project ToolsVersion="3.5" DefaultTarget="Release" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <Target Name="Release"> <!-- MSBuild BuildInParallel="true" Projects="@(ProjectsToBuild)"/ --> <ItemGroup> <RequiredProperty Include="PropA"><Name>PropA</Name></RequiredProperty> <RequiredProperty Include="PropB"><Name>PropB</Name></RequiredProperty> <RequiredProperty Include="PropC"><Name>PropC</Name></RequiredProperty> </ItemGroup> <Error Text="Property %(RequiredProperty.Name) required" Condition="'$(%(RequiredProperty.Name))' == ''" /> </Target> </Project>

    Read the article

  • Is there a workaround for JDBC w/liquibase and MySQL session variables & client side SQL instructions

    - by David
    Slowly building a starter changeSet xml file for one of three of my employer's primary schema's. The only show stopper has been incorporating the sizable library of MySQL stored procedures to be managed by liquibase. One sproc has been somewhat of a pain to deal with: The first few statements go like use TargetSchema; select "-- explanatory inline comment thats actually useful --" into vDummy; set @@session.sql_mode='TRADITIONAL' ; drop procedure if exists adm_delete_stats ; delimiter $$ create procedure adm_delete_stats( ...rest of sproc I cut out the use statement as its counter-productive, but real issue is the set @@session.sql_mode statement which causes an exception like liquibase.exception.MigrationFailedException: Migration failed for change set ./foobarSchema/sprocs/adm_delete_stats.xml::1293560556-151::dward_autogen dward: Reason: liquibase.exception.DatabaseException: Error executing SQL ... And then the delimiter statement is another stumbling block. Doing do dilligence research I found this rejected MySQL bug report here and this MySQL forum thread that goes a little bit more in depth to the problem here. Is there anyway I can use the sproc scripts that currently exist with Liquibase or would I have to re-write several hundred stored procedures? I've tried createProcedure, sqlFile, and sql liquibase tags without much luck as I think the core issue is that set, delimiter, and similar SQL commands are meant to be interpreted and acted upon by the client side interpreter before being delivered to the server.

    Read the article

  • Why does this JSON fail only in iPhone?

    - by 4thSpace
    I'm using the JSON framework from http://code.google.com/p/json-framework. The JSON below fails with this error: -JSONValue failed. Error trace is: ( Error Domain=org.brautaset.JSON.ErrorDomain Code=5 UserInfo=0x124a20 "Unescaped control character '0xd'", Error Domain=org.brautaset.JSON.ErrorDomain Code=3 UserInfo=0x11bc20 "Object value expected for key: Phone", Error Domain=org.brautaset.JSON.ErrorDomain Code=3 UserInfo=0x1ac6e0 "Expected value while parsing array" ) JSON being parsed: [{"id" :"2422","name" :"BusinessA","address" :"7100 U.S. 50","lat" :"38.342945","lng" :"-90.390701","CityId" :"11","StateId" :"38","CategoryId" :"1","Phone" :"(200) 200-2000","zip" :"00010"}] I think 0xd represents a carriage. When I put the above JSON in TextWrangler, I don't see any carriage returns. I got the JSON by doing "po myjson" in the debugger. It passes this validator: http://json.parser.online.fr/. Can anyone see what the problem may be?

    Read the article

  • My C# and DLL Data Woes

    - by Lynn
    Hey guys, I'm a very beginner C# coder. So, if I get some of the terms incorrect, please be easy on me. I'm trying to see if it is possible to pull data from a DLL. I did some research and found that you can store application resources within a DLL. What I couldn't find, was the information to tell me how to do that. There is a MS article that explains how to access resources within a satellite DLL, but I honestly don't know if that is what I'm looking for. http://msdn.microsoft.com/en-us/library/ms165653.aspx I did try some of the codes involved, but there are some "FileNotFoundExceptions" going on. The rest of the DLL information is showing up: classes, objects, etc. I just added the DLL as a resource in my Visual Studio Project and added it with "using". I just don't know how to get at the meat of it, if it is possible. Thanks, Lynn

    Read the article

  • Firefox Extension Socket Transport

    - by Nathan
    Hey, I'm making a firefox extension and I'm currently trying to get it to send XML data over a local socket to another application that's listening on that socket. Does anyone know what I'm doing wrong in this? Its probably something simple and I'm just having a monday. Thanks. socketConn: function() { var httpLoc = window.top.getBrowser(). selectedBrowser.contentWindow.location.href; var outputData = '<?xml version="1.0"?>' + '<site_data>' + '<session_id></session_id>' + 'site_url>' + httpLoc + '</site_url>' + '<mime_type></mime_type>' + '<data_file>' + filePath + '</data_file>' + '<capture_mode></capture_mode>' + '</site_data>\n'; var transportService = Cc["@mozilla.org/network/socket-transport-service;1"] .getService(Ci.nsISocketTransportService); var transport = transportService.createTransport(["starttls"], 1,"localhost",currentPort, null); var outstream = transport.openOutputStream(0, 0, 0); outstream.write(outputData, outputData.length); var stream = transport.openInputStream(0, 0, 0); var instream = Cc["@mozilla.org/scriptableinputstream;1"] .createInstance(Ci.nsIScriptableInputStream); instream.init(stream); var dataListener = { data : "", onStartRequest: function(request, context){}, onStopRequest: function(request, context, status){ instream.close(); outstream.close(); }, onDataAvailable: function(request, context, inputStream, offset, count){ this.data += instream.read(count); }, };//end dataListener var pump = Cc["@mozilla.org/network/input-stream-pump;1"] .createInstance(Ci.nsIInputStreamPump); pump.init(stream, -1, -1, 0, 0, false); pump.asyncRead(dataListener, null); }//end socketConn Please ask questions about this if you don't understand what I'm trying to do with this.

    Read the article

  • Unable to import Eclipse project to Android studio

    - by Binoy Babu
    Whenever I try to import my Eclipse project to Android Studio I get the following error : You are using an old, unsupported version of Gradle. Please use version 1.8 or greater. Please point to a supported Gradle version in the project's Gradle settings or in the project's Gradle wrapper (if applicable.) Consult IDE log for more details (Help | Show Log) Im using Android Studio 0.3 and Ubuntu, I also tried it on a Windows 8 box with fresh install but getting the same error. I'm using default gradle wrapper and I tried checking and unchecking auto import option. Is this a bug? How can I get around it. How do I update gradle to 1.8 or check the current gradle version? My build.gradle is given below. buildscript { repositories { mavenCentral() } dependencies { classpath 'com.android.tools.build:gradle:0.6.3' // I also tried using 0.6.1 and 0.5.+ } } apply plugin: 'android' dependencies { compile fileTree(dir: 'libs', include: '*.jar') } android { compileSdkVersion 18 buildToolsVersion "18.0.1" sourceSets { main { manifest.srcFile 'AndroidManifest.xml' java.srcDirs = ['src'] resources.srcDirs = ['src'] aidl.srcDirs = ['src'] renderscript.srcDirs = ['src'] res.srcDirs = ['res'] assets.srcDirs = ['assets'] } // Move the tests to tests/java, tests/res, etc... instrumentTest.setRoot('tests') // Move the build types to build-types/<type> // For instance, build-types/debug/java, build-types/debug/AndroidManifest.xml, ... // This moves them out of them default location under src/<type>/... which would // conflict with src/ being used by the main source set. // Adding new build types or product flavors should be accompanied // by a similar customization. debug.setRoot('build-types/debug') release.setRoot('build-types/release') } }

    Read the article

  • Parsing C#, finding methods and putting try/catch to all methods

    - by erdogany
    I know it sounds weird but I am required to put a wrapping try catch block to every method to catch all exceptions. We have thousands of methods and I need to do it in an automated way. What do you suggest? I am planning to parse all cs files and detect methods and insert a try catch block with an application. Can you suggest me any parser that I can easily use? or anything that will help me... every method has its unique number like 5006 public static LogEntry Authenticate(....) { LogEntry logEntry = null; try { .... return logEntry; } catch (CompanyException) { throw; } catch (Exception ex) { logEntry = new LogEntry( "5006", RC.GetString("5006"), EventLogEntryType.Error, LogEntryCategory.Foo); throw new CompanyException(logEntry, ex); } } I created this for this; http://thinkoutofthenet.com/index.php/2009/01/12/batch-code-method-manipulation/

    Read the article

  • Websphere 7 EntityManagerFactory creation problem

    - by mihaela
    Hello, I'm working on a maven project which uses seam 2.2.0, hibernate 3.5.0-CR-2 as JPA provider, DB2 as database server and Websphere 7 as application server. Now I'm facing de following problem: In my EJBs that are seen also as SEAM components I want to use the EntityManager from EJB container (@PersistenceContext private EntityManager em) not Seam's EntityManager (@In private EntityManager em). But this is the problem, I cannot obtain an EntityManager using @PersistenceContext. On server logs it sais that it cannot create an EntityManagerFactory and gets a ClassCastException: java.lang.ClassCastException: org.hibernate.ejb.HibernatePersistence incompatible with javax.persistence.spi.PersistenceProvider After a lot of debugging and searching on forums I'm assuming that the problem is that Websphere doesn't use the Hibernate JPA provider. Has anyone faced this problem and has a solution? I configured already WAS class loader order for my application to load the classes with the application class loader first and I\ve packed all necessary jars in application ear as written in: WAS InfoCenter: Features for EJB 3.0 development . If necessary I'll post my persistence.xml, components.xml files and stack trace. I've found this problem discussed also here: Websphere EntityManagerFactory creation problem Hibernate 3.3 fail to create entity manager factory in Websphere 7.0. Please help Any hint will be useful. Thanks in advance! Mihaela

    Read the article

  • how to parse jquery ajax xhtml response?

    - by steve
    Sorry if this has been posted many times. But I've tried many variations and it still doesn't work. The HTML comes back from the jquery AJAX call fine and I am trying to remove the header and footers from the response using: // none of these work for me $("#content", data); $("#content", $(data)); $(data).find("#content").html() I've breakpoint the response to verify the #content exists by inspected $(data) and using alert to print out the data's text. I've also try using "body" or "a" as selectors, but it always come back as undefined. I've read in this post that you can't pull in the full XHTML document: http://stackoverflow.com/questions/1050333/jquery-ajax-parse-response-text. But I can't find the answer's quote anymore, maybe it's outdated? Has anyone ran into this problem? Many thanks, Steve

    Read the article

  • How to recall search pattern when writing replace regex pattern in Vim?

    - by Tom Morris
    Here's the scenario: I've got a big file filled with all sorts of eclectic rubbish that I want to regex. I fiddle around and come up with a perfect search pattern by using the / command and seeing what it highlights. Now I want to use that pattern to replace with. So, I start typing :%s/ and I cannot recall what the pattern was. Is there some magical keyboard command that will pull in my last search pattern here? If I'm writing a particularly complex regex, I have even opened up a new MacVim window, typed the regex from the first window into a buffer there, then typed it back into the Vim window when writing the replace pattern. There has got to be a better way of doing so.

    Read the article

  • java.lang.ClassCastException: $Proxy99 cannot be cast

    - by svaret
    Hi, I am using JBoss4.2.2 and java6. The deployed ear's name is apa.ear In a servlet I have the following code line: placeBid = (PlaceBid) context.lookup("apa/" + PlaceBid.class.getSimpleName() + "/remote"); I have a generated jboss-app.xml like this: <jboss-app> <loader-repository>apa:app=ejb3</loader-repository> </jboss-app> When trying to get the PlaceBid via the context I get this exception java.lang.ClassCastException: $Proxy99 cannot be cast to se.nextit.actionbazaar.buslogic.PlaceBid The PlaceBid interface looks like this: @Remote public interface PlaceBid { Long addBid(String userId, Long itemId, Double bidPrice); } When I run the example coming with EJB3 in action it works. EJB3 in action sample code comes with ant building. I want to use Maven so I have rearranged the code some. However, I don't understan what I am doing wrong here. I have some thoughts about the jboss-app.xml file. I am not sure of how its content should look like. Grateful for any help. Best wishes Lasse

    Read the article

  • Strange rare out-of-order data received using Indy

    - by Jim
    We're having a bizarre problem with Indy10 where two large strings (a few hundred characters each) that we send out one after the other are appearing at the other end intertwined oddly. This happens extremely infrequently. Each string is a complete XML message terminated with a LF and in general the READ process reads an entire XML message, returning when it sees the LF. The call to actually send the message is protected by a critical section around the call to the IOHandler's writeln method and so it is not possible for two threads to send at the same time. (We're certain the critical section is implemented/working properly). This problem happens very rarely. The symptoms are odd...when we send string A followed by string B what we received at the other end (on the rare occasions where we have failure) is the trailing section of string A by itself (i.e., there's a LF at the end of it) followed by the leading section of string A and then the entire string B followed by a single LF. We've verified that the "timed out" property is not true after the partial read - we log that property after every read that returns content. Also, we know there are no embedded LF characters in the string, as we explicitly replace all non-alphanumeric characters in the string with spaces before appending the LF and sending it. We have log mechanisms inside the critical sections on both the transmission and receiving ends and so we can see this behavior at the "wire". We're completely baffled and wondering (although always the lowest possibility) whether there could be some low-level Indy issues that might cause this issue, e.g., buffers being sent in the wrong order....very hard to believe this could be the issue but we're grasping at straws. Does anyone have any bright ideas?

    Read the article

  • Is this a bug? : I get " The type ... is not a complex type or an entity type" in my WCF data servic

    - by veertien
    When invoking a query on the data service I get this error message inside the XML feed: <m:error> <m:code></m:code> <m:message xml:lang="nl-NL">Internal Server Error. The type 'MyType' is not a complex type or an entity type.</m:message> </m:error> When I use the example described here in the article "How to: Create a Data Service Using the Reflection Provider (WCF Data Services)" http://msdn.microsoft.com/en-us/library/dd728281(v=VS.100).aspx it works as expected. I have created the service in a .NET 4.0 web project. My data context class returns a query object that is derived from the LINQExtender (http://linqextender.codeplex.com/). When I execute the query object in a unit test, it works as expected. My entity type is defined as: [DataServiceKey("Id")] public class Accommodation { [UniqueIdentifier] [OriginalFieldName("EntityId")] public string Id { get; set; } [OriginalFieldName("AccoName")] public string Name { get; set; } } (the UniqueIdentifier and OriginalFieldName attributes are used by LINQExtender) Does anybody know if this is a bug in WCF data services or am I doing something wrong?

    Read the article

  • Java Classpath Issues with Webservices(CXF) and Jboss

    - by JohnC
    I am using CXF(which autogenerates my webservices in my pom.xml from my wsdl) with JBoss(eclipse ide), and I am having some trouble accessing the webservice from my web application. I found this resource: http://blog.progs.be/?p=92 but I am having a really hard time using WSDL_LOCATION = cl.getResource( "my/progam/pack/wsdl/myService.wsdl" ); to work properly in my code. I have my wsdls located in src/main/wsdl and have added the following line to the .classpath file: classpathentry kind="src" path="src/main/wsdl" I also created the folders my,program,pack,wsdl and dropped my wsdls into that location, so it is accessible. However, the classloader.getResource call always returns null no matter what. When I specify getResource( "/wsdl/myService.wsdl" ) it does not return null, but I believe it looks at the full file path and not what I need (considering part of the URL contains the path to the wsdl file all the way through the jboss server directory and includes the WEB-INF dir. Is my .classpath file set up incorrectly or am I missing something else? if the WSDL Location is not correct it always throws a ClassCast Exception like so: java.lang.ClassCastException: org.apache.cxf.jaxws.ServiceImpl at javax.xml.ws.Service.(Service.java:81)

    Read the article

  • Extracting pure content / text from HTML Pages by excluding navigation and chrome content

    - by Ankur Gupta
    Hi, I am crawling news websites and want to extract News Title, News Abstract (First Paragraph), etc I plugged into the webkit parser code to easily navigate webpage as a tree. To eliminate navigation and other non news content I take the text version of the article (minus the html tags, webkit provides api for the same). Then I run the diff algorithm comparing various article's text from same website this results in similar text being eliminated. This gives me content minus the common navigation content etc. Despite the above approach I am still getting quite some junk in my final text. This results in incorrect News Abstract being extracted. The error rate is 5 in 10 article i.e. 50%. Error as in Can you Suggest an alternative strategy for extraction of pure content, Would/Can learning Natural Language rocessing help in extracting correct abstract from these articles ? How would you approach the above problem ?. Are these any research papers on the same ?. Regards Ankur Gupta

    Read the article

< Previous Page | 428 429 430 431 432 433 434 435 436 437 438 439  | Next Page >