Cannot Train PCBA Random Forest Model via sklearn
See original GitHub issueI’ve tried many times to train the sklearn RandomForest (even setting n_jobs=-1 to use all 16 CPU cores) via examples/pcba/pcba_sklearn.py but it stalls for many hours during fitting at this point consistently:
Processing shard 53
Task PCBA-1030
Task PCBA-1379
Task PCBA-1452
Task PCBA-1454
Task PCBA-1457
Task PCBA-1458
Task PCBA-1460
Task PCBA-1461
Task PCBA-1468
Task PCBA-1469
Task PCBA-1471
Task PCBA-1479
Task PCBA-1631
Task PCBA-1634
Task PCBA-1688
Task PCBA-1721
Task PCBA-2100
Task PCBA-2101
Task PCBA-2147
Task PCBA-2242
Task PCBA-2326
Task PCBA-2451
Task PCBA-2517
Task PCBA-2528
Task PCBA-2546
Task PCBA-2549
Task PCBA-2551
Task PCBA-2662
Task PCBA-2675
Task PCBA-2676
Task PCBA-411
Task PCBA-463254
Task PCBA-485281
Task PCBA-485290
Task PCBA-485294
Task PCBA-485297
Task PCBA-485313
Task PCBA-485314
Task PCBA-485341
Task PCBA-485349
Task PCBA-485353
Task PCBA-485360
Task PCBA-485364
Task PCBA-485367
Task PCBA-492947
Task PCBA-493208
Task PCBA-504327
Task PCBA-504332
Task PCBA-504333
Task PCBA-504339
Task PCBA-504444
Task PCBA-504466
Task PCBA-504467
Task PCBA-504706
Task PCBA-504842
Task PCBA-504845
Task PCBA-504847
Task PCBA-504891
Task PCBA-540276
Task PCBA-540317
Task PCBA-588342
Task PCBA-588453
Task PCBA-588456
Task PCBA-588579
Task PCBA-588590
Task PCBA-588591
Task PCBA-588795
Task PCBA-588855
Task PCBA-602179
Task PCBA-602233
Task PCBA-602310
Task PCBA-602313
Task PCBA-602332
Task PCBA-624170
Task PCBA-624171
Task PCBA-624173
Task PCBA-624202
Task PCBA-624246
Task PCBA-624287
Task PCBA-624288
Task PCBA-624291
Task PCBA-624296
Task PCBA-624297
Task PCBA-624417
Task PCBA-651635
Task PCBA-651644
Task PCBA-651768
Task PCBA-651965
Task PCBA-652025
Task PCBA-652104
Task PCBA-652105
Task PCBA-652106
Task PCBA-686970
Task PCBA-686978
Task PCBA-686979
Task PCBA-720504
Task PCBA-720532
Task PCBA-720542
Task PCBA-720551
Task PCBA-720553
Task PCBA-720579
Task PCBA-720580
Task PCBA-720707
Task PCBA-720708
Task PCBA-720709
Task PCBA-720711
Task PCBA-743255
Task PCBA-743266
Task PCBA-875
Task PCBA-881
Task PCBA-883
Task PCBA-884
Task PCBA-885
Task PCBA-887
Task PCBA-891
Task PCBA-899
Task PCBA-902
Task PCBA-903
Task PCBA-904
Task PCBA-912
Task PCBA-914
Task PCBA-915
Task PCBA-924
Task PCBA-925
Task PCBA-926
Task PCBA-927
Task PCBA-938
Task PCBA-995
Dataset for task PCBA-1030 has shape ((129403, 1024), (129403, 1), (129403, 1), (129403,))
Dataset for task PCBA-1379 has shape ((158421, 1024), (158421, 1), (158421, 1), (158421,))
Dataset for task PCBA-1452 has shape ((119538, 1024), (119538, 1), (119538, 1), (119538,))
Dataset for task PCBA-1454 has shape ((100530, 1024), (100530, 1), (100530, 1), (100530,))
Dataset for task PCBA-1457 has shape ((162192, 1024), (162192, 1), (162192, 1), (162192,))
Dataset for task PCBA-1458 has shape ((156868, 1024), (156868, 1), (156868, 1), (156868,))
Dataset for task PCBA-1460 has shape ((179641, 1024), (179641, 1), (179641, 1), (179641,))
Dataset for task PCBA-1461 has shape ((166527, 1024), (166527, 1), (166527, 1), (166527,))
Dataset for task PCBA-1468 has shape ((201973, 1024), (201973, 1), (201973, 1), (201973,))
Dataset for task PCBA-1469 has shape ((220337, 1024), (220337, 1), (220337, 1), (220337,))
Dataset for task PCBA-1471 has shape ((174974, 1024), (174974, 1), (174974, 1), (174974,))
Dataset for task PCBA-1479 has shape ((218501, 1024), (218501, 1), (218501, 1), (218501,))
Dataset for task PCBA-1631 has shape ((207821, 1024), (207821, 1), (207821, 1), (207821,))
Dataset for task PCBA-1634 has shape ((209591, 1024), (209591, 1), (209591, 1), (209591,))
Dataset for task PCBA-1688 has shape ((163434, 1024), (163434, 1), (163434, 1), (163434,))
Dataset for task PCBA-1721 has shape ((232374, 1024), (232374, 1), (232374, 1), (232374,))
Dataset for task PCBA-2100 has shape ((234300, 1024), (234300, 1), (234300, 1), (234300,))
Dataset for task PCBA-2101 has shape ((248072, 1024), (248072, 1), (248072, 1), (248072,))
Dataset for task PCBA-2147 has shape ((153957, 1024), (153957, 1), (153957, 1), (153957,))
Dataset for task PCBA-2242 has shape ((147212, 1024), (147212, 1), (147212, 1), (147212,))
Dataset for task PCBA-2326 has shape ((208615, 1024), (208615, 1), (208615, 1), (208615,))
Dataset for task PCBA-2451 has shape ((220004, 1024), (220004, 1), (220004, 1), (220004,))
Dataset for task PCBA-2517 has shape ((268428, 1024), (268428, 1), (268428, 1), (268428,))
Dataset for task PCBA-2528 has shape ((277117, 1024), (277117, 1), (277117, 1), (277117,))
Dataset for task PCBA-2546 has shape ((222847, 1024), (222847, 1), (222847, 1), (222847,))
Dataset for task PCBA-2549 has shape ((185383, 1024), (185383, 1), (185383, 1), (185383,))
Dataset for task PCBA-2551 has shape ((216225, 1024), (216225, 1), (216225, 1), (216225,))
Dataset for task PCBA-2662 has shape ((228089, 1024), (228089, 1), (228089, 1), (228089,))
Dataset for task PCBA-2675 has shape ((198938, 1024), (198938, 1), (198938, 1), (198938,))
Dataset for task PCBA-2676 has shape ((286697, 1024), (286697, 1), (286697, 1), (286697,))
Dataset for task PCBA-411 has shape ((56298, 1024), (56298, 1), (56298, 1), (56298,))
Dataset for task PCBA-463254 has shape ((263235, 1024), (263235, 1), (263235, 1), (263235,))
Dataset for task PCBA-485281 has shape ((251790, 1024), (251790, 1), (251790, 1), (251790,))
Dataset for task PCBA-485290 has shape ((271331, 1024), (271331, 1), (271331, 1), (271331,))
Dataset for task PCBA-485294 has shape ((247938, 1024), (247938, 1), (247938, 1), (247938,))
Dataset for task PCBA-485297 has shape ((248189, 1024), (248189, 1), (248189, 1), (248189,))
Dataset for task PCBA-485313 has shape ((249181, 1024), (249181, 1), (249181, 1), (249181,))
Dataset for task PCBA-485314 has shape ((253673, 1024), (253673, 1), (253673, 1), (253673,))
Dataset for task PCBA-485341 has shape ((261812, 1024), (261812, 1), (261812, 1), (261812,))
Dataset for task PCBA-485349 has shape ((255829, 1024), (255829, 1), (255829, 1), (255829,))
Dataset for task PCBA-485353 has shape ((258367, 1024), (258367, 1), (258367, 1), (258367,))
Dataset for task PCBA-485360 has shape ((174782, 1024), (174782, 1), (174782, 1), (174782,))
Dataset for task PCBA-485364 has shape ((273760, 1024), (273760, 1), (273760, 1), (273760,))
Dataset for task PCBA-485367 has shape ((260772, 1024), (260772, 1), (260772, 1), (260772,))
Dataset for task PCBA-492947 has shape ((263367, 1024), (263367, 1), (263367, 1), (263367,))
Dataset for task PCBA-493208 has shape ((33425, 1024), (33425, 1), (33425, 1), (33425,))
Dataset for task PCBA-504327 has shape ((297583, 1024), (297583, 1), (297583, 1), (297583,))
Dataset for task PCBA-504332 has shape ((238122, 1024), (238122, 1), (238122, 1), (238122,))
Dataset for task PCBA-504333 has shape ((260795, 1024), (260795, 1), (260795, 1), (260795,))
Dataset for task PCBA-504339 has shape ((284468, 1024), (284468, 1), (284468, 1), (284468,))
Dataset for task PCBA-504444 has shape ((232519, 1024), (232519, 1), (232519, 1), (232519,))
Dataset for task PCBA-504466 has shape ((248531, 1024), (248531, 1), (248531, 1), (248531,))
Dataset for task PCBA-504467 has shape ((194337, 1024), (194337, 1), (194337, 1), (194337,))
Dataset for task PCBA-504706 has shape ((242036, 1024), (242036, 1), (242036, 1), (242036,))
Dataset for task PCBA-504842 has shape ((259609, 1024), (259609, 1), (259609, 1), (259609,))
Dataset for task PCBA-504845 has shape ((297973, 1024), (297973, 1), (297973, 1), (297973,))
Dataset for task PCBA-504847 has shape ((305121, 1024), (305121, 1), (305121, 1), (305121,))
Dataset for task PCBA-504891 has shape ((288936, 1024), (288936, 1), (288936, 1), (288936,))
Dataset for task PCBA-540276 has shape ((164637, 1024), (164637, 1), (164637, 1), (164637,))
Dataset for task PCBA-540317 has shape ((295941, 1024), (295941, 1), (295941, 1), (295941,))
Dataset for task PCBA-588342 has shape ((261529, 1024), (261529, 1), (261529, 1), (261529,))
Dataset for task PCBA-588453 has shape ((297852, 1024), (297852, 1), (297852, 1), (297852,))
Dataset for task PCBA-588456 has shape ((308484, 1024), (308484, 1), (308484, 1), (308484,))
Dataset for task PCBA-588579 has shape ((312377, 1024), (312377, 1), (312377, 1), (312377,))
Dataset for task PCBA-588590 has shape ((286567, 1024), (286567, 1), (286567, 1), (286567,))
Dataset for task PCBA-588591 has shape ((299990, 1024), (299990, 1), (299990, 1), (299990,))
Dataset for task PCBA-588795 has shape ((303008, 1024), (303008, 1), (303008, 1), (303008,))
Dataset for task PCBA-588855 has shape ((282459, 1024), (282459, 1), (282459, 1), (282459,))
Dataset for task PCBA-602179 has shape ((308590, 1024), (308590, 1), (308590, 1), (308590,))
Dataset for task PCBA-602233 has shape ((303344, 1024), (303344, 1), (303344, 1), (303344,))
Dataset for task PCBA-602310 has shape ((315329, 1024), (315329, 1), (315329, 1), (315329,))
Dataset for task PCBA-602313 has shape ((298395, 1024), (298395, 1), (298395, 1), (298395,))
Dataset for task PCBA-602332 has shape ((330650, 1024), (330650, 1), (330650, 1), (330650,))
Dataset for task PCBA-624170 has shape ((318901, 1024), (318901, 1), (318901, 1), (318901,))
Dataset for task PCBA-624171 has shape ((316716, 1024), (316716, 1), (316716, 1), (316716,))
Dataset for task PCBA-624173 has shape ((320869, 1024), (320869, 1), (320869, 1), (320869,))
Dataset for task PCBA-624202 has shape ((293056, 1024), (293056, 1), (293056, 1), (293056,))
Dataset for task PCBA-624246 has shape ((291944, 1024), (291944, 1), (291944, 1), (291944,))
Dataset for task PCBA-624287 has shape ((241976, 1024), (241976, 1), (241976, 1), (241976,))
Dataset for task PCBA-624288 has shape ((259354, 1024), (259354, 1), (259354, 1), (259354,))
Dataset for task PCBA-624291 has shape ((265487, 1024), (265487, 1), (265487, 1), (265487,))
Dataset for task PCBA-624296 has shape ((233680, 1024), (233680, 1), (233680, 1), (233680,))
Dataset for task PCBA-624297 has shape ((246483, 1024), (246483, 1), (246483, 1), (246483,))
Dataset for task PCBA-624417 has shape ((260604, 1024), (260604, 1), (260604, 1), (260604,))
Dataset for task PCBA-651635 has shape ((277599, 1024), (277599, 1), (277599, 1), (277599,))
Dataset for task PCBA-651644 has shape ((283625, 1024), (283625, 1), (283625, 1), (283625,))
Dataset for task PCBA-651768 has shape ((285976, 1024), (285976, 1), (285976, 1), (285976,))
Dataset for task PCBA-651965 has shape ((261055, 1024), (261055, 1), (261055, 1), (261055,))
Dataset for task PCBA-652025 has shape ((291353, 1024), (291353, 1), (291353, 1), (291353,))
Dataset for task PCBA-652104 has shape ((300670, 1024), (300670, 1), (300670, 1), (300670,))
Dataset for task PCBA-652105 has shape ((257786, 1024), (257786, 1), (257786, 1), (257786,))
Dataset for task PCBA-652106 has shape ((290271, 1024), (290271, 1), (290271, 1), (290271,))
Dataset for task PCBA-686970 has shape ((269468, 1024), (269468, 1), (269468, 1), (269468,))
Dataset for task PCBA-686978 has shape ((242472, 1024), (242472, 1), (242472, 1), (242472,))
Dataset for task PCBA-686979 has shape ((247811, 1024), (247811, 1), (247811, 1), (247811,))
Dataset for task PCBA-720504 has shape ((280265, 1024), (280265, 1), (280265, 1), (280265,))
Dataset for task PCBA-720532 has shape ((10455, 1024), (10455, 1), (10455, 1), (10455,))
Dataset for task PCBA-720542 has shape ((285374, 1024), (285374, 1), (285374, 1), (285374,))
Dataset for task PCBA-720551 has shape ((274115, 1024), (274115, 1), (274115, 1), (274115,))
Dataset for task PCBA-720553 has shape ((271268, 1024), (271268, 1), (271268, 1), (271268,))
Dataset for task PCBA-720579 has shape ((227277, 1024), (227277, 1), (227277, 1), (227277,))
Dataset for task PCBA-720580 has shape ((245441, 1024), (245441, 1), (245441, 1), (245441,))
Dataset for task PCBA-720707 has shape ((290634, 1024), (290634, 1), (290634, 1), (290634,))
Dataset for task PCBA-720708 has shape ((285782, 1024), (285782, 1), (285782, 1), (285782,))
Dataset for task PCBA-720709 has shape ((282572, 1024), (282572, 1), (282572, 1), (282572,))
Dataset for task PCBA-720711 has shape ((290638, 1024), (290638, 1), (290638, 1), (290638,))
Dataset for task PCBA-743255 has shape ((296529, 1024), (296529, 1), (296529, 1), (296529,))
Dataset for task PCBA-743266 has shape ((319272, 1024), (319272, 1), (319272, 1), (319272,))
Dataset for task PCBA-875 has shape ((58939, 1024), (58939, 1), (58939, 1), (58939,))
Dataset for task PCBA-881 has shape ((83379, 1024), (83379, 1), (83379, 1), (83379,))
Dataset for task PCBA-883 has shape ((6492, 1024), (6492, 1), (6492, 1), (6492,))
Dataset for task PCBA-884 has shape ((8378, 1024), (8378, 1), (8378, 1), (8378,))
Dataset for task PCBA-885 has shape ((10294, 1024), (10294, 1), (10294, 1), (10294,))
Dataset for task PCBA-887 has shape ((55450, 1024), (55450, 1), (55450, 1), (55450,))
Dataset for task PCBA-891 has shape ((6268, 1024), (6268, 1), (6268, 1), (6268,))
Dataset for task PCBA-899 has shape ((6582, 1024), (6582, 1), (6582, 1), (6582,))
Dataset for task PCBA-902 has shape ((95206, 1024), (95206, 1), (95206, 1), (95206,))
Dataset for task PCBA-903 has shape ((42426, 1024), (42426, 1), (42426, 1), (42426,))
Dataset for task PCBA-904 has shape ((40960, 1024), (40960, 1), (40960, 1), (40960,))
Dataset for task PCBA-912 has shape ((45325, 1024), (45325, 1), (45325, 1), (45325,))
Dataset for task PCBA-914 has shape ((6224, 1024), (6224, 1), (6224, 1), (6224,))
Dataset for task PCBA-915 has shape ((6403, 1024), (6403, 1), (6403, 1), (6403,))
Dataset for task PCBA-924 has shape ((95967, 1024), (95967, 1), (95967, 1), (95967,))
Dataset for task PCBA-925 has shape ((51211, 1024), (51211, 1), (51211, 1), (51211,))
Dataset for task PCBA-926 has shape ((45156, 1024), (45156, 1), (45156, 1), (45156,))
Dataset for task PCBA-927 has shape ((46883, 1024), (46883, 1), (46883, 1), (46883,))
Dataset for task PCBA-938 has shape ((49893, 1024), (49893, 1), (49893, 1), (49893,))
Dataset for task PCBA-995 has shape ((52456, 1024), (52456, 1), (52456, 1), (52456,))
Fitting model for task PCBA-1030
Fitting model for task PCBA-1379
Fitting model for task PCBA-1452
Fitting model for task PCBA-1454
Fitting model for task PCBA-1457
Fitting model for task PCBA-1458
Fitting model for task PCBA-1460
Fitting model for task PCBA-1461
Fitting model for task PCBA-1468
Fitting model for task PCBA-1469
Fitting model for task PCBA-1471
Fitting model for task PCBA-1479
Fitting model for task PCBA-1631
Fitting model for task PCBA-1634
Fitting model for task PCBA-1688
Fitting model for task PCBA-1721
Fitting model for task PCBA-2100
Fitting model for task PCBA-2101
Fitting model for task PCBA-2147
Fitting model for task PCBA-2242
Fitting model for task PCBA-2326
Fitting model for task PCBA-2451
Fitting model for task PCBA-2517
Fitting model for task PCBA-2528
Fitting model for task PCBA-2546
Fitting model for task PCBA-2549
Fitting model for task PCBA-2551
Fitting model for task PCBA-2662
Fitting model for task PCBA-2675
Fitting model for task PCBA-2676
Fitting model for task PCBA-411
Fitting model for task PCBA-463254
Fitting model for task PCBA-485281
Fitting model for task PCBA-485290
Fitting model for task PCBA-485294
Fitting model for task PCBA-485297
Fitting model for task PCBA-485313
Fitting model for task PCBA-485314
Fitting model for task PCBA-485341
Fitting model for task PCBA-485349
Fitting model for task PCBA-485353
Fitting model for task PCBA-485360
Fitting model for task PCBA-485364
Fitting model for task PCBA-485367
Fitting model for task PCBA-492947
Fitting model for task PCBA-493208
Fitting model for task PCBA-504327
Fitting model for task PCBA-504332
Fitting model for task PCBA-504333
Fitting model for task PCBA-504339
Fitting model for task PCBA-504444
Fitting model for task PCBA-504466
Fitting model for task PCBA-504467
Fitting model for task PCBA-504706
Fitting model for task PCBA-504842
Fitting model for task PCBA-504845
Fitting model for task PCBA-504847
Fitting model for task PCBA-504891
Fitting model for task PCBA-540276
Fitting model for task PCBA-540317
Fitting model for task PCBA-588342
Fitting model for task PCBA-588453
Fitting model for task PCBA-588456
Fitting model for task PCBA-588579
Fitting model for task PCBA-588590
Fitting model for task PCBA-588591
Fitting model for task PCBA-588795
Fitting model for task PCBA-588855
Fitting model for task PCBA-602179
Fitting model for task PCBA-602233
Fitting model for task PCBA-602310
Fitting model for task PCBA-602313
Fitting model for task PCBA-602332
Fitting model for task PCBA-624170
Fitting model for task PCBA-624171
Fitting model for task PCBA-624173
Fitting model for task PCBA-624202
Fitting model for task PCBA-624246
Fitting model for task PCBA-624287
Fitting model for task PCBA-624288
Fitting model for task PCBA-624291
Fitting model for task PCBA-624296
Fitting model for task PCBA-624297
Fitting model for task PCBA-624417
Fitting model for task PCBA-651635
Fitting model for task PCBA-651644
Fitting model for task PCBA-651768
Fitting model for task PCBA-651965
Fitting model for task PCBA-652025
Fitting model for task PCBA-652104
After model PCBA-652104 the performance seems to just hang indefinitely; the script hangs with 50% CPU used on the system for over 12 hours; since many more datasets are remaining I kill the script at this point.
I believe that we should change this example script to set n_estimators to 10, which is what the molnet code seems to use. Or alternatively just delete it to prevent confusion! I learn towards the second recommendation.
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Incremental training of random forest model using python ...
Currently, the train set has about 2 years data. Is there a way to train on another 2 years and (kind of) append...
Read more >Retraining model in scikit learn Random Forest
I have a machine learning Random Forest model that predicts a certain variable. It's implemented with scikit learn and it works fine.
Read more >The parameter sensitivity of random forests - BMC Bioinformatics
To train a random forest model, a bootstrap [24] sample is drawn, with the number of samples specified by the parameter sampsize [25]....
Read more >Implementing a Random Forest Classification Model in Python
We'll use train-test-split to split the data into training data and testing data. from sklearn.model_selection import train_test_split# ...
Read more >How to Build a Gradient Boosting Regression Model using ...
This video will show you how to build a gradient boosting regression model using scikit -learn.FREE Data Science Resources and Access to ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m on it, thanks
Closing this old issue. Feel free to re-open if there’s new things to discuss.