scikit learn extratreeclassifier hanging
        Posted  
        
            by 
                denson
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by denson
        
        
        
        Published on 2014-05-27T09:22:32Z
        Indexed on 
            2014/05/27
            9:24 UTC
        
        
        Read the original article
        Hit count: 216
        
scikit-learn
I'm running the scikit learn on some rather large training datasets ~1,600,000,000 rows with ~500 features. The platform is Ubuntu server 14.04, the hardware has 100gb of ram and 20 CPU cores.
The test datasets are about half as many rows.
I set n_jobs = 10, and am forest_size = 3*number_of_features so about 1700 trees.
If I reduce the number of features to about 350 it works fine but never completes the training phase with the full feature set of 500+. The process is still executing and using up about 20gb of ram but is using 0% of CPU. I have also successfully completed on datasets with ~400,000 rows but twice as many features which completes after only about 1 hour.
I am being careful to delete any arrays/objects that are not in use.
Does anyone have any ideas I might try?
© Stack Overflow or respective owner