Search Results

Search found 1 results on 1 pages for 'trzewiczek'.

Page 1/1 | 1 

  • Huge file in Clojure and Java heap space error

    - by trzewiczek
    I posted before on a huge XML file - it's a 287GB XML with Wikipedia dump I want ot put into CSV file (revisions authors and timestamps). I managed to do that till some point. Before I got the StackOverflow Error, but now after solving the first problem I get: java.lang.OutOfMemoryError: Java heap space error. My code (partly taken from Justin Kramer answer) looks like that: (defn process-pages [page] (let [title (article-title page) revisions (filter #(= :revision (:tag %)) (:content page))] (for [revision revisions] (let [user (revision-user revision) time (revision-timestamp revision)] (spit "files/data.csv" (str "\"" time "\";\"" user "\";\"" title "\"\n" ) :append true))))) (defn open-file [file-name] (let [rdr (BufferedReader. (FileReader. file-name))] (->> (:content (data.xml/parse rdr :coalescing false)) (filter #(= :page (:tag %))) (map process-pages)))) I don't show article-title, revision-user and revision-title functions, because they just simply take data from a specific place in the page or revision hash. Anyone could help me with this - I'm really new in Clojure and don't get the problem.

    Read the article

1