bobturbo - Developer IT

BufferedReader no longer buffering after a while?

- by BobTurbo

Sorry I can't post code but I have a bufferedreader with 50000000 bytes set as the buffer size. It works as you would expect for half an hour, the HDD light flashing every two minutes or so, reading in the big chunk of data, and then going quiet again as the CPU processes it. But after about half an hour (this is a very big file), the HDD starts thrashing as if it is reading one byte at a time. It is still in the same loop and I think I checked free ram to rule out swapping (heap size is default). Probably won't get any helpful answers, but worth a try. OK I have changed heap size to 768mb and still nothing. There is plenty of free memory and java.exe is only using about 300mb. Now I have profiled it and heap stays at about 200MB, well below what is available. CPU stays at 50%. Yet the HDD starts thrashing like crazy. I have.. no idea. I am going to rewrite the whole thing in c#, that is my solution. Here is the code (it is just a throw-away script, not pretty): BufferedReader s = null; HashMap<String, Integer> allWords = new HashMap<String, Integer>(); HashSet<String> pageWords = new HashSet<String>(); long[] pageCount = new long[78592]; long pages = 0; Scanner wordFile = new Scanner(new BufferedReader(new FileReader("allWords.txt"))); while (wordFile.hasNext()) { allWords.put(wordFile.next(), Integer.parseInt(wordFile.next())); } s = new BufferedReader(new FileReader("wikipedia/enwiki-latest-pages-articles.xml"), 50000000); StringBuilder words = new StringBuilder(); String nextLine = null; while ((nextLine = s.readLine()) != null) { if (a.matcher(nextLine).matches()) { continue; } else if (b.matcher(nextLine).matches()) { continue; } else if (c.matcher(nextLine).matches()) { continue; } else if (d.matcher(nextLine).matches()) { nextLine = s.readLine(); if (e.matcher(nextLine).matches()) { if (f.matcher(s.readLine()).matches()) { pageWords.addAll(Arrays.asList(words.toString().toLowerCase().split("[^a-zA-Z]"))); words.setLength(0); pages++; for (String word : pageWords) { if (allWords.containsKey(word)) { pageCount[allWords.get(word)]++; } else if (!word.isEmpty() && allWords.containsKey(word.substring(0, word.length() - 1))) { pageCount[allWords.get(word.substring(0, word.length() - 1))]++; } } pageWords.clear(); } } } else if (g.matcher(nextLine).matches()) { continue; } words.append(nextLine); words.append(" "); }

Read the article

BufferedReader ready method in a while loop to determine EOF?

- by BobTurbo

I have a large file (wikipedia english arcticles only database as xml file) I am using to read one character at a time using BufferedReader. The psuedo code is: file = BufferedReader... while (file.ready()) character = file.read() is this actually valid? Or will ready just return false when it is waiting for the HDD to return data and not actually when the EOF has been reached? I tried to use if (file.read() == -1) but seemed to run into an infinite loop that I literally could not find. I am just wondering if it is reading the whole file as my statistics say 444 380 wikipedia pages have been read but I thought there were many more articles..

Read the article

efficiently compare two BitArrays of the same length

- by BobTurbo

How would I do this? I am trying to count when both arrays have the same value of TRUE/1 at the same index. As you can see, my code has multiple bitarrays and is looping through each one and comparing them with a comparisonArray with another loop. It doesn't seem to be very efficient and I need it to be. foreach (bitArrayTuple in bitarryList) { for (int i = 0; i < arrayLength; i++) if (bArrayTuple.Item2[i] && comparisonArray[i]) bitArrayTuple.Item1++; } where Item1 is the count and Item2 is a bitarray.

Read the article

can't read from stream until child exits?

- by BobTurbo

OK I have a program that creates two pipes - forks - the child's stdin and stdout are redirected to one end of each pipe - the parent is connected to the other ends of the pipes and tries to read the stream associated with the child's output and print it to the screen (and I will also make it write to the input of the child eventually). The problem is, when the parent tries to fgets the child's output stream, it just stalls and waits until the child dies to fgets and then print the output. If the child doesn't exit, it just waits forever. What is going on? I thought that maybe fgets would block until SOMETHING was in the stream, but not block all the way until the child gives up its file descriptors. Here is the code: #include <stdlib.h> #include <stdio.h> #include <string.h> #include <sys/types.h> #include <unistd.h> int main(int argc, char *argv[]) { FILE* fpin; FILE* fpout; int input_fd[2]; int output_fd[2]; pid_t pid; int status; char input[100]; char output[100]; char *args[] = {"/somepath/someprogram", NULL}; fgets(input, 100, stdin); // the user inputs the program name to exec pipe(input_fd); pipe(output_fd); pid = fork(); if (pid == 0) { close(input_fd[1]); close(output_fd[0]); dup2(input_fd[0], 0); dup2(output_fd[1], 1); input[strlen(input)-1] = '\0'; execvp(input, args); } else { close(input_fd[0]); close(output_fd[1]); fpin = fdopen(input_fd[1], "w"); fpout = fdopen(output_fd[0], "r"); while(!feof(fpout)) { fgets(output, 100, fpout); printf("output: %s\n", output); } } return 0; }

Read the article

after dup2, stream still contains old contents?

- by BobTurbo

so if I do: dup2(0, backup); // backup stdin dup2(somefile, 0); // somefile has four lines of content fgets(...stdin); // consume one line fgets(....stdin); // consume two lines dup2(backup, 0); // switch stdin back to keyboard I am finding at this point.. stdin still contains the two lines I haven't consumed. Why is that? Because there is just one buffer no matter how many times you redirect? How do I get rid of the two lines left but still remember where I was in the somefile stream when I want to go back to it?

Read the article

create backup file descriptor?

- by BobTurbo

stdinBackup = 4; dup2(0, stdinBackup); Currently I am doing the above to 'backup' stdin so that it can be restored from backup later after it has been redirected somewhere else. I have a feeling that I am doing a lot wrong? (eg arbitrarily assigning 4 is surely not right). Anyone point me in the right direction?

Developer IT