Search Results

Search found 2 results on 1 pages for 'gnomed'.

Page 1/1 | 1 

  • Avoid an "out of memory error" in Java(eclipse), when using large data structure?

    - by gnomed
    OK, so I am writing a program that unfortunately needs to use a huge data structure to complete its work, but it is failing with a "out of memory error" during its initialization. While I understand entirely what that means and why it is a problem, I am having trouble overcoming it, since my program needs to use this large structure and I don't know any other way to store it. The program first indexes a large corpus of text files that I provide. This works fine. Then it uses this index to initialize a large 2D array. This array will have nXn entries, where "n" is the number of unique words in the corpus of text. For the relatively small chunk I am testing it on(about 60 files) it needs to make approximately 30,000x30,000 entries. this will probably be bigger once I run it on my full intended corpus too. It consistently fails every time, after it indexes, while it is initializing the data structure(to be worked on later). Things I have done include: revamp my code to use a primitive "int[]" instead of a "TreeMap" eliminate redundant structures, etc... Also, I have run eclipse with "eclipse -vmargs -Xmx2g" to max out my allocated memory I am fairly confident this is not going to be a simple line of code solution, but is most likely going to require a very new approach. I am looking for what that approach is, any ideas? Thanks, B.

    Read the article

  • How would I sort files to directories based on filenames?

    - by gnomed
    I have a huge number of files to sort all named in some terrible convention. Here are some examples: (4)_mr__mcloughlin____.txt 12__sir_john_farr____.txt (b)mr__chope____.txt dame_elaine_kellett-bowman____.txt dr__blackburn______.txt These names are supposed to be a different person (speaker) each. Someone in another IT department produced these from a ton of XML files using some script but the naming is unfathomably stupid as you can see. I need to sort literally tens of thousands of these files with multiple files of text for each person; each with something stupid making the filename different, be it more underscores or some random number. They need to be sorted by speaker. This would be easier with a script to do most of the work then I could just go back and merge folders that should be under the same name or whatever. There are a number of ways I was thinking about doing this. parse the names from each file and sort them into folders for each unique name. get a list of all the unique names from the filenames, then look through this simplified list of unique names for similar ones and ask me whether they are the same, and once it has determined this it will sort them all accordingly. I plan on using Perl, but I can try a new language if it's worth it. I'm not sure how to go about reading in each filename in a directory one at a time into a string for parsing into an actual name. I'm not completely sure how to parse with regex in perl either, but that might be googleable. For the sorting, I was just gonna use the shell command: `cp filename.txt /example/destination/filename.txt` but just cause that's all I know so it's easiest. I dont even have a pseudocode idea of what im going to do either so if someone knows the best sequence of actions, im all ears. I guess I am looking for a lot of help, I am open to any suggestions. Many many many thanks to anyone who can help. B.

    Read the article

1