Stuck at being unable to print a substring no more than 4679 characters

Posted by Newcoder on Stack Overflow See other posts from Stack Overflow or by Newcoder
Published on 2013-11-10T14:55:44Z Indexed on 2013/11/10 15:54 UTC
Read the original article Hit count: 223

Filed under:
|
|

I have a program that does string manipulation on very large strings (around 100K). The first step in my program is to cleanup the input string so that it only contains certain characters. Here is my method for this cleanup:

    public static String analyzeString (String input) {
    String output = null;

    output = input.replaceAll("[-+.^:,]","");
    output = output.replaceAll("(\\r|\\n)", "");
    output = output.toUpperCase();
    output = output.replaceAll("[^XYZ]", "");
    return output;
}

When i print my 'input' string of length 97498, it prints successfully. My output string after cleanup is of length 94788. I can print the size using output.length() but when I try to print this in Eclipse, output is empty and i can see in eclipse output console header. Since this is not my final program, so I ignored this and proceeded to next method that does pattern matching on this 'cleaned-up' string. Here is code for pattern matching:

    public static List<Integer> getIntervals(String input, String regex) {
    List<Integer> output = new ArrayList<Integer> ();
    // Do pattern matching
    Pattern p1 = Pattern.compile(regex);
    Matcher m1 = p1.matcher(input);

    // If match found
    while (m1.find()) {
        output.add(m1.start());
        output.add(m1.end());
    }


    return output;
}

Based on this program, i identify the start and end intervals of my pattern match as 12351 and 87314. I tried to print this match as output.substring(12351, 87314) and only get blank output. Numerous hit and trial runs resulted in the conclusion that biggest substring that i can print is of length 4679. If i try 4680, i again get blank input. My confusion is that if i was able to print original string (97498) length, why i couldnt print the cleaned-up string (length 94788) or the substring (length >4679). Is it due to regular expression implementation which may be causing some memory issues and my system is not able to handle that? I have 4GB installed memory.

© Stack Overflow or respective owner

Related posts about java

Related posts about regex