Problem when get pageContent of URL in java ?

Posted by tiendv on Stack Overflow See other posts from Stack Overflow or by tiendv
Published on 2010-05-11T01:46:29Z Indexed on 2010/05/11 1:54 UTC
Read the original article Hit count: 250

Filed under:

Hi all !

i have a code for get pagecontent from a URL here is code !

    import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;


public class GetPageFromURLAction extends Thread {

    public String stringPageContent;
    public String targerURL;

    public  String getPageContent(String targetURL) throws IOException {
            String returnString="";
            URL urlString = new URL(targetURL);
            URLConnection openConnection = urlString.openConnection();
            String temp;
             BufferedReader in = new BufferedReader(new InputStreamReader(openConnection.getInputStream()));
                while ((temp = in.readLine()) != null) 
                {
                    returnString += temp + "\n";        
                }       
                in.close();
              //  String nohtml = sb.toString().replaceAll("\\<.*?>","");
                return returnString;

     }

    public String getStringPageContent() {
        return stringPageContent;
    }

    public void setStringPageContent(String stringPageContent) {
        this.stringPageContent = stringPageContent;
    }

    public String getTargerURL() {
        return targerURL;
    }

    public void setTargerURL(String targerURL) {
        this.targerURL = targerURL;
    }

    @Override
    public void run() {
        try {
            this.stringPageContent=this.getPageContent(targerURL);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

The problem is : 1 Some time i receive a error lik 405 ,or 403 HTTP error ... and result string is null . To repair i check permission to connect URL but it usualy return null

URLConnection openConnection = urlString.openConnection();
openConnection.getPermission(

) is mean that i don't have permission to acess link ?

  1. To get resultString without HTML Tag ? i do like that

    String nohtml = sb.toString().replaceAll("\<.*?>",""); Para sb is Stringbulder , but it can't remove all HTML Tab in string return ?

  2. I use thread here because i must get page alot of url , so how can i cread a multi thread to impro speed of program !

Thanks

© Stack Overflow or respective owner

Related posts about java