question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot Train PCBA Random Forest Model via sklearn

See original GitHub issue

I’ve tried many times to train the sklearn RandomForest (even setting n_jobs=-1 to use all 16 CPU cores) via examples/pcba/pcba_sklearn.py but it stalls for many hours during fitting at this point consistently:

Processing shard 53
	Task PCBA-1030
	Task PCBA-1379
	Task PCBA-1452
	Task PCBA-1454
	Task PCBA-1457
	Task PCBA-1458
	Task PCBA-1460
	Task PCBA-1461
	Task PCBA-1468
	Task PCBA-1469
	Task PCBA-1471
	Task PCBA-1479
	Task PCBA-1631
	Task PCBA-1634
	Task PCBA-1688
	Task PCBA-1721
	Task PCBA-2100
	Task PCBA-2101
	Task PCBA-2147
	Task PCBA-2242
	Task PCBA-2326
	Task PCBA-2451
	Task PCBA-2517
	Task PCBA-2528
	Task PCBA-2546
	Task PCBA-2549
	Task PCBA-2551
	Task PCBA-2662
	Task PCBA-2675
	Task PCBA-2676
	Task PCBA-411
	Task PCBA-463254
	Task PCBA-485281
	Task PCBA-485290
	Task PCBA-485294
	Task PCBA-485297
	Task PCBA-485313
	Task PCBA-485314
	Task PCBA-485341
	Task PCBA-485349
	Task PCBA-485353
	Task PCBA-485360
	Task PCBA-485364
	Task PCBA-485367
	Task PCBA-492947
	Task PCBA-493208
	Task PCBA-504327
	Task PCBA-504332
	Task PCBA-504333
	Task PCBA-504339
	Task PCBA-504444
	Task PCBA-504466
	Task PCBA-504467
	Task PCBA-504706
	Task PCBA-504842
	Task PCBA-504845
	Task PCBA-504847
	Task PCBA-504891
	Task PCBA-540276
	Task PCBA-540317
	Task PCBA-588342
	Task PCBA-588453
	Task PCBA-588456
	Task PCBA-588579
	Task PCBA-588590
	Task PCBA-588591
	Task PCBA-588795
	Task PCBA-588855
	Task PCBA-602179
	Task PCBA-602233
	Task PCBA-602310
	Task PCBA-602313
	Task PCBA-602332
	Task PCBA-624170
	Task PCBA-624171
	Task PCBA-624173
	Task PCBA-624202
	Task PCBA-624246
	Task PCBA-624287
	Task PCBA-624288
	Task PCBA-624291
	Task PCBA-624296
	Task PCBA-624297
	Task PCBA-624417
	Task PCBA-651635
	Task PCBA-651644
	Task PCBA-651768
	Task PCBA-651965
	Task PCBA-652025
	Task PCBA-652104
	Task PCBA-652105
	Task PCBA-652106
	Task PCBA-686970
	Task PCBA-686978
	Task PCBA-686979
	Task PCBA-720504
	Task PCBA-720532
	Task PCBA-720542
	Task PCBA-720551
	Task PCBA-720553
	Task PCBA-720579
	Task PCBA-720580
	Task PCBA-720707
	Task PCBA-720708
	Task PCBA-720709
	Task PCBA-720711
	Task PCBA-743255
	Task PCBA-743266
	Task PCBA-875
	Task PCBA-881
	Task PCBA-883
	Task PCBA-884
	Task PCBA-885
	Task PCBA-887
	Task PCBA-891
	Task PCBA-899
	Task PCBA-902
	Task PCBA-903
	Task PCBA-904
	Task PCBA-912
	Task PCBA-914
	Task PCBA-915
	Task PCBA-924
	Task PCBA-925
	Task PCBA-926
	Task PCBA-927
	Task PCBA-938
	Task PCBA-995
Dataset for task PCBA-1030 has shape ((129403, 1024), (129403, 1), (129403, 1), (129403,))
Dataset for task PCBA-1379 has shape ((158421, 1024), (158421, 1), (158421, 1), (158421,))
Dataset for task PCBA-1452 has shape ((119538, 1024), (119538, 1), (119538, 1), (119538,))
Dataset for task PCBA-1454 has shape ((100530, 1024), (100530, 1), (100530, 1), (100530,))
Dataset for task PCBA-1457 has shape ((162192, 1024), (162192, 1), (162192, 1), (162192,))
Dataset for task PCBA-1458 has shape ((156868, 1024), (156868, 1), (156868, 1), (156868,))
Dataset for task PCBA-1460 has shape ((179641, 1024), (179641, 1), (179641, 1), (179641,))
Dataset for task PCBA-1461 has shape ((166527, 1024), (166527, 1), (166527, 1), (166527,))
Dataset for task PCBA-1468 has shape ((201973, 1024), (201973, 1), (201973, 1), (201973,))
Dataset for task PCBA-1469 has shape ((220337, 1024), (220337, 1), (220337, 1), (220337,))
Dataset for task PCBA-1471 has shape ((174974, 1024), (174974, 1), (174974, 1), (174974,))
Dataset for task PCBA-1479 has shape ((218501, 1024), (218501, 1), (218501, 1), (218501,))
Dataset for task PCBA-1631 has shape ((207821, 1024), (207821, 1), (207821, 1), (207821,))
Dataset for task PCBA-1634 has shape ((209591, 1024), (209591, 1), (209591, 1), (209591,))
Dataset for task PCBA-1688 has shape ((163434, 1024), (163434, 1), (163434, 1), (163434,))
Dataset for task PCBA-1721 has shape ((232374, 1024), (232374, 1), (232374, 1), (232374,))
Dataset for task PCBA-2100 has shape ((234300, 1024), (234300, 1), (234300, 1), (234300,))
Dataset for task PCBA-2101 has shape ((248072, 1024), (248072, 1), (248072, 1), (248072,))
Dataset for task PCBA-2147 has shape ((153957, 1024), (153957, 1), (153957, 1), (153957,))
Dataset for task PCBA-2242 has shape ((147212, 1024), (147212, 1), (147212, 1), (147212,))
Dataset for task PCBA-2326 has shape ((208615, 1024), (208615, 1), (208615, 1), (208615,))
Dataset for task PCBA-2451 has shape ((220004, 1024), (220004, 1), (220004, 1), (220004,))
Dataset for task PCBA-2517 has shape ((268428, 1024), (268428, 1), (268428, 1), (268428,))
Dataset for task PCBA-2528 has shape ((277117, 1024), (277117, 1), (277117, 1), (277117,))
Dataset for task PCBA-2546 has shape ((222847, 1024), (222847, 1), (222847, 1), (222847,))
Dataset for task PCBA-2549 has shape ((185383, 1024), (185383, 1), (185383, 1), (185383,))
Dataset for task PCBA-2551 has shape ((216225, 1024), (216225, 1), (216225, 1), (216225,))
Dataset for task PCBA-2662 has shape ((228089, 1024), (228089, 1), (228089, 1), (228089,))
Dataset for task PCBA-2675 has shape ((198938, 1024), (198938, 1), (198938, 1), (198938,))
Dataset for task PCBA-2676 has shape ((286697, 1024), (286697, 1), (286697, 1), (286697,))
Dataset for task PCBA-411 has shape ((56298, 1024), (56298, 1), (56298, 1), (56298,))
Dataset for task PCBA-463254 has shape ((263235, 1024), (263235, 1), (263235, 1), (263235,))
Dataset for task PCBA-485281 has shape ((251790, 1024), (251790, 1), (251790, 1), (251790,))
Dataset for task PCBA-485290 has shape ((271331, 1024), (271331, 1), (271331, 1), (271331,))
Dataset for task PCBA-485294 has shape ((247938, 1024), (247938, 1), (247938, 1), (247938,))
Dataset for task PCBA-485297 has shape ((248189, 1024), (248189, 1), (248189, 1), (248189,))
Dataset for task PCBA-485313 has shape ((249181, 1024), (249181, 1), (249181, 1), (249181,))
Dataset for task PCBA-485314 has shape ((253673, 1024), (253673, 1), (253673, 1), (253673,))
Dataset for task PCBA-485341 has shape ((261812, 1024), (261812, 1), (261812, 1), (261812,))
Dataset for task PCBA-485349 has shape ((255829, 1024), (255829, 1), (255829, 1), (255829,))
Dataset for task PCBA-485353 has shape ((258367, 1024), (258367, 1), (258367, 1), (258367,))
Dataset for task PCBA-485360 has shape ((174782, 1024), (174782, 1), (174782, 1), (174782,))
Dataset for task PCBA-485364 has shape ((273760, 1024), (273760, 1), (273760, 1), (273760,))
Dataset for task PCBA-485367 has shape ((260772, 1024), (260772, 1), (260772, 1), (260772,))
Dataset for task PCBA-492947 has shape ((263367, 1024), (263367, 1), (263367, 1), (263367,))
Dataset for task PCBA-493208 has shape ((33425, 1024), (33425, 1), (33425, 1), (33425,))
Dataset for task PCBA-504327 has shape ((297583, 1024), (297583, 1), (297583, 1), (297583,))
Dataset for task PCBA-504332 has shape ((238122, 1024), (238122, 1), (238122, 1), (238122,))
Dataset for task PCBA-504333 has shape ((260795, 1024), (260795, 1), (260795, 1), (260795,))
Dataset for task PCBA-504339 has shape ((284468, 1024), (284468, 1), (284468, 1), (284468,))
Dataset for task PCBA-504444 has shape ((232519, 1024), (232519, 1), (232519, 1), (232519,))
Dataset for task PCBA-504466 has shape ((248531, 1024), (248531, 1), (248531, 1), (248531,))
Dataset for task PCBA-504467 has shape ((194337, 1024), (194337, 1), (194337, 1), (194337,))
Dataset for task PCBA-504706 has shape ((242036, 1024), (242036, 1), (242036, 1), (242036,))
Dataset for task PCBA-504842 has shape ((259609, 1024), (259609, 1), (259609, 1), (259609,))
Dataset for task PCBA-504845 has shape ((297973, 1024), (297973, 1), (297973, 1), (297973,))
Dataset for task PCBA-504847 has shape ((305121, 1024), (305121, 1), (305121, 1), (305121,))
Dataset for task PCBA-504891 has shape ((288936, 1024), (288936, 1), (288936, 1), (288936,))
Dataset for task PCBA-540276 has shape ((164637, 1024), (164637, 1), (164637, 1), (164637,))
Dataset for task PCBA-540317 has shape ((295941, 1024), (295941, 1), (295941, 1), (295941,))
Dataset for task PCBA-588342 has shape ((261529, 1024), (261529, 1), (261529, 1), (261529,))
Dataset for task PCBA-588453 has shape ((297852, 1024), (297852, 1), (297852, 1), (297852,))
Dataset for task PCBA-588456 has shape ((308484, 1024), (308484, 1), (308484, 1), (308484,))
Dataset for task PCBA-588579 has shape ((312377, 1024), (312377, 1), (312377, 1), (312377,))
Dataset for task PCBA-588590 has shape ((286567, 1024), (286567, 1), (286567, 1), (286567,))
Dataset for task PCBA-588591 has shape ((299990, 1024), (299990, 1), (299990, 1), (299990,))
Dataset for task PCBA-588795 has shape ((303008, 1024), (303008, 1), (303008, 1), (303008,))
Dataset for task PCBA-588855 has shape ((282459, 1024), (282459, 1), (282459, 1), (282459,))
Dataset for task PCBA-602179 has shape ((308590, 1024), (308590, 1), (308590, 1), (308590,))
Dataset for task PCBA-602233 has shape ((303344, 1024), (303344, 1), (303344, 1), (303344,))
Dataset for task PCBA-602310 has shape ((315329, 1024), (315329, 1), (315329, 1), (315329,))
Dataset for task PCBA-602313 has shape ((298395, 1024), (298395, 1), (298395, 1), (298395,))
Dataset for task PCBA-602332 has shape ((330650, 1024), (330650, 1), (330650, 1), (330650,))
Dataset for task PCBA-624170 has shape ((318901, 1024), (318901, 1), (318901, 1), (318901,))
Dataset for task PCBA-624171 has shape ((316716, 1024), (316716, 1), (316716, 1), (316716,))
Dataset for task PCBA-624173 has shape ((320869, 1024), (320869, 1), (320869, 1), (320869,))
Dataset for task PCBA-624202 has shape ((293056, 1024), (293056, 1), (293056, 1), (293056,))
Dataset for task PCBA-624246 has shape ((291944, 1024), (291944, 1), (291944, 1), (291944,))
Dataset for task PCBA-624287 has shape ((241976, 1024), (241976, 1), (241976, 1), (241976,))
Dataset for task PCBA-624288 has shape ((259354, 1024), (259354, 1), (259354, 1), (259354,))
Dataset for task PCBA-624291 has shape ((265487, 1024), (265487, 1), (265487, 1), (265487,))
Dataset for task PCBA-624296 has shape ((233680, 1024), (233680, 1), (233680, 1), (233680,))
Dataset for task PCBA-624297 has shape ((246483, 1024), (246483, 1), (246483, 1), (246483,))
Dataset for task PCBA-624417 has shape ((260604, 1024), (260604, 1), (260604, 1), (260604,))
Dataset for task PCBA-651635 has shape ((277599, 1024), (277599, 1), (277599, 1), (277599,))
Dataset for task PCBA-651644 has shape ((283625, 1024), (283625, 1), (283625, 1), (283625,))
Dataset for task PCBA-651768 has shape ((285976, 1024), (285976, 1), (285976, 1), (285976,))
Dataset for task PCBA-651965 has shape ((261055, 1024), (261055, 1), (261055, 1), (261055,))
Dataset for task PCBA-652025 has shape ((291353, 1024), (291353, 1), (291353, 1), (291353,))
Dataset for task PCBA-652104 has shape ((300670, 1024), (300670, 1), (300670, 1), (300670,))
Dataset for task PCBA-652105 has shape ((257786, 1024), (257786, 1), (257786, 1), (257786,))
Dataset for task PCBA-652106 has shape ((290271, 1024), (290271, 1), (290271, 1), (290271,))
Dataset for task PCBA-686970 has shape ((269468, 1024), (269468, 1), (269468, 1), (269468,))
Dataset for task PCBA-686978 has shape ((242472, 1024), (242472, 1), (242472, 1), (242472,))
Dataset for task PCBA-686979 has shape ((247811, 1024), (247811, 1), (247811, 1), (247811,))
Dataset for task PCBA-720504 has shape ((280265, 1024), (280265, 1), (280265, 1), (280265,))
Dataset for task PCBA-720532 has shape ((10455, 1024), (10455, 1), (10455, 1), (10455,))
Dataset for task PCBA-720542 has shape ((285374, 1024), (285374, 1), (285374, 1), (285374,))
Dataset for task PCBA-720551 has shape ((274115, 1024), (274115, 1), (274115, 1), (274115,))
Dataset for task PCBA-720553 has shape ((271268, 1024), (271268, 1), (271268, 1), (271268,))
Dataset for task PCBA-720579 has shape ((227277, 1024), (227277, 1), (227277, 1), (227277,))
Dataset for task PCBA-720580 has shape ((245441, 1024), (245441, 1), (245441, 1), (245441,))
Dataset for task PCBA-720707 has shape ((290634, 1024), (290634, 1), (290634, 1), (290634,))
Dataset for task PCBA-720708 has shape ((285782, 1024), (285782, 1), (285782, 1), (285782,))
Dataset for task PCBA-720709 has shape ((282572, 1024), (282572, 1), (282572, 1), (282572,))
Dataset for task PCBA-720711 has shape ((290638, 1024), (290638, 1), (290638, 1), (290638,))
Dataset for task PCBA-743255 has shape ((296529, 1024), (296529, 1), (296529, 1), (296529,))
Dataset for task PCBA-743266 has shape ((319272, 1024), (319272, 1), (319272, 1), (319272,))
Dataset for task PCBA-875 has shape ((58939, 1024), (58939, 1), (58939, 1), (58939,))
Dataset for task PCBA-881 has shape ((83379, 1024), (83379, 1), (83379, 1), (83379,))
Dataset for task PCBA-883 has shape ((6492, 1024), (6492, 1), (6492, 1), (6492,))
Dataset for task PCBA-884 has shape ((8378, 1024), (8378, 1), (8378, 1), (8378,))
Dataset for task PCBA-885 has shape ((10294, 1024), (10294, 1), (10294, 1), (10294,))
Dataset for task PCBA-887 has shape ((55450, 1024), (55450, 1), (55450, 1), (55450,))
Dataset for task PCBA-891 has shape ((6268, 1024), (6268, 1), (6268, 1), (6268,))
Dataset for task PCBA-899 has shape ((6582, 1024), (6582, 1), (6582, 1), (6582,))
Dataset for task PCBA-902 has shape ((95206, 1024), (95206, 1), (95206, 1), (95206,))
Dataset for task PCBA-903 has shape ((42426, 1024), (42426, 1), (42426, 1), (42426,))
Dataset for task PCBA-904 has shape ((40960, 1024), (40960, 1), (40960, 1), (40960,))
Dataset for task PCBA-912 has shape ((45325, 1024), (45325, 1), (45325, 1), (45325,))
Dataset for task PCBA-914 has shape ((6224, 1024), (6224, 1), (6224, 1), (6224,))
Dataset for task PCBA-915 has shape ((6403, 1024), (6403, 1), (6403, 1), (6403,))
Dataset for task PCBA-924 has shape ((95967, 1024), (95967, 1), (95967, 1), (95967,))
Dataset for task PCBA-925 has shape ((51211, 1024), (51211, 1), (51211, 1), (51211,))
Dataset for task PCBA-926 has shape ((45156, 1024), (45156, 1), (45156, 1), (45156,))
Dataset for task PCBA-927 has shape ((46883, 1024), (46883, 1), (46883, 1), (46883,))
Dataset for task PCBA-938 has shape ((49893, 1024), (49893, 1), (49893, 1), (49893,))
Dataset for task PCBA-995 has shape ((52456, 1024), (52456, 1), (52456, 1), (52456,))
Fitting model for task PCBA-1030
Fitting model for task PCBA-1379
Fitting model for task PCBA-1452
Fitting model for task PCBA-1454
Fitting model for task PCBA-1457
Fitting model for task PCBA-1458
Fitting model for task PCBA-1460
Fitting model for task PCBA-1461
Fitting model for task PCBA-1468
Fitting model for task PCBA-1469
Fitting model for task PCBA-1471
Fitting model for task PCBA-1479
Fitting model for task PCBA-1631
Fitting model for task PCBA-1634
Fitting model for task PCBA-1688
Fitting model for task PCBA-1721
Fitting model for task PCBA-2100
Fitting model for task PCBA-2101
Fitting model for task PCBA-2147
Fitting model for task PCBA-2242
Fitting model for task PCBA-2326
Fitting model for task PCBA-2451
Fitting model for task PCBA-2517
Fitting model for task PCBA-2528
Fitting model for task PCBA-2546
Fitting model for task PCBA-2549
Fitting model for task PCBA-2551
Fitting model for task PCBA-2662
Fitting model for task PCBA-2675
Fitting model for task PCBA-2676
Fitting model for task PCBA-411
Fitting model for task PCBA-463254
Fitting model for task PCBA-485281
Fitting model for task PCBA-485290
Fitting model for task PCBA-485294
Fitting model for task PCBA-485297
Fitting model for task PCBA-485313
Fitting model for task PCBA-485314
Fitting model for task PCBA-485341
Fitting model for task PCBA-485349
Fitting model for task PCBA-485353
Fitting model for task PCBA-485360
Fitting model for task PCBA-485364
Fitting model for task PCBA-485367
Fitting model for task PCBA-492947
Fitting model for task PCBA-493208
Fitting model for task PCBA-504327
Fitting model for task PCBA-504332
Fitting model for task PCBA-504333
Fitting model for task PCBA-504339
Fitting model for task PCBA-504444
Fitting model for task PCBA-504466
Fitting model for task PCBA-504467
Fitting model for task PCBA-504706
Fitting model for task PCBA-504842
Fitting model for task PCBA-504845
Fitting model for task PCBA-504847
Fitting model for task PCBA-504891
Fitting model for task PCBA-540276
Fitting model for task PCBA-540317
Fitting model for task PCBA-588342
Fitting model for task PCBA-588453
Fitting model for task PCBA-588456
Fitting model for task PCBA-588579
Fitting model for task PCBA-588590
Fitting model for task PCBA-588591
Fitting model for task PCBA-588795
Fitting model for task PCBA-588855
Fitting model for task PCBA-602179
Fitting model for task PCBA-602233
Fitting model for task PCBA-602310
Fitting model for task PCBA-602313
Fitting model for task PCBA-602332
Fitting model for task PCBA-624170
Fitting model for task PCBA-624171
Fitting model for task PCBA-624173
Fitting model for task PCBA-624202
Fitting model for task PCBA-624246
Fitting model for task PCBA-624287
Fitting model for task PCBA-624288
Fitting model for task PCBA-624291
Fitting model for task PCBA-624296
Fitting model for task PCBA-624297
Fitting model for task PCBA-624417
Fitting model for task PCBA-651635
Fitting model for task PCBA-651644
Fitting model for task PCBA-651768
Fitting model for task PCBA-651965
Fitting model for task PCBA-652025
Fitting model for task PCBA-652104

After model PCBA-652104 the performance seems to just hang indefinitely; the script hangs with 50% CPU used on the system for over 12 hours; since many more datasets are remaining I kill the script at this point.

I believe that we should change this example script to set n_estimators to 10, which is what the molnet code seems to use. Or alternatively just delete it to prevent confusion! I learn towards the second recommendation.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
LRParsercommented, Sep 9, 2017

I’m on it, thanks

0reactions
rbharathcommented, Jan 18, 2020

Closing this old issue. Feel free to re-open if there’s new things to discuss.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Incremental training of random forest model using python ...
Currently, the train set has about 2 years data. Is there a way to train on another 2 years and (kind of) append...
Read more >
Retraining model in scikit learn Random Forest
I have a machine learning Random Forest model that predicts a certain variable. It's implemented with scikit learn and it works fine.
Read more >
The parameter sensitivity of random forests - BMC Bioinformatics
To train a random forest model, a bootstrap [24] sample is drawn, with the number of samples specified by the parameter sampsize [25]....
Read more >
Implementing a Random Forest Classification Model in Python
We'll use train-test-split to split the data into training data and testing data. from sklearn.model_selection import train_test_split# ...
Read more >
How to Build a Gradient Boosting Regression Model using ...
This video will show you how to build a gradient boosting regression model using scikit -learn.FREE Data Science Resources and Access to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found