Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

clarification in reproducing ESOL results?

See original GitHub issue

Hello, I have been playing around with chemprop, and my particular goal is to reproduce the 0.555 ± 0.047 result that you report for ESOL. Currently I’m seeing numbers in the range of 0.72.

(as an aside, I’ve also tried playing around with main2.sh from https://github.com/yangkevin2/lsc_experiments. It’s quite slow due to running so many experiments, and it produces a whole pile of results, of which I’m unsure which, if any, directly correspond to the reported 0.555 ± 0.047 result.)

Based on the general instructions in the “Results” section, I believe that I should first optimize hyperparameters, with a command like

 python hyperparameter_optimization.py --data_path data/esol.csv --dataset_type regression --num_iters 2 --config_save_path esol-hyper-config.json --quiet --split_type random

However I immediately have multiple questions

what is the correct value for num_iters?
what is the correct split_type for this dataset? Presumably either predetermined or index_predetermined
depending on the split type, what are the correct folds_file and test_fold_index (for predetermined), or the correct crossval_index_file (for index_predetermined)?
should I be adding --features_generator rdkit_2d_normalized --no_features_scaling for the ESOL dataset?

I believe the second and final step is to then run a command along the lines of

python train.py --data_path data/esol.csv --dataset_type regression --save_dir saves/esol --split_type scaffold_balanced --config_path esol-hyper-config.json

all the same questions about splits and folds apply. Should the settings be the same or different?
additionally, should I add --num_folds 10 and --seed 3? These seem to be the standard in the lsc_experiments repo

And as a final question, is this all otherwise correct?

Thank you!

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

swansonk14commented, Apr 28, 2020

Hi @medchemistry,

Thank you for the questions! And I apologize for the mess in https://github.com/yangkevin2/lsc_experiments – we tried a lot of different experiments and never fully cleaned up that repo once we got the results we needed.

Just as a note, due to inherent randomness in the initial chemprop model weights as well as some recent changes to RDKit that affect the features we use, our results won’t be entirely reproducible. However, you should be able to get something pretty similar.

Also, in the paper we were very strict about doing nested cross-validation for hyperparameter optimization, but that required some hacking around in chemprop. Below I’ll describe a simpler setup that allows some overlap between the molecules used for hyperparameter optimization and the molecules used for the final evaluation. If you want to try the strict nested cross-validation version, let me know and I can dig up the scripts that will create those splits.

Here are the steps:

Pre-compute the RDKit features to save time later

python scripts/save_features.py \
    --data_path data/esol.csv \
    --features_generator rdkit_2d_normalized \
    --save_path features/esol.npz \
    --sequential

Run hyperparameter optimization on the chemprop model with RDKit features:

python hyperparameter_optimization.py \
    --data_path data/esol.csv \
    --dataset_type regression \
    --config_save_path configs/esol.json \
    --quiet \
    --split_type scaffold_balanced \
    --features_path data/esol.npz \
    --no_features_scaling \
    --num_iters 20 \
    --num_folds 3 \
    --seed 0

Note: The command above will do 3-fold cross-validation for each of 20 hyperparameter configurations, thus training 60 models total. If this is too slow, feel free to tune either --num_iters or --num_folds.

Train an ensemble of models with RDKit features and optimized hyperparameters to determine the final performance:

python train.py \
    --data_path data/esol.csv \
    --dataset_type regression \
    --save_dir ckpt/esol \
    --quiet \
    --split_type scaffold_balanced \
    --config_path configs/esol.json \
    --features_path features/esol.npz \
    --no_features_scaling \
    --ensemble_size 5 \
    --num_folds 10 \
    --seed 3

Note: This step is also going to be slow since it’s training 5 models for each of 10 cross-validation folds, so 50 models total. Feel free to tune these parameters depending on how long you’re willing to wait.

Also note that the --seed 3 is because hyperparameter optimization will run on seeds 0, 1, and 2 (if using --seed 0 and --num_folds 3). There may still be some overlap between hyperparameter optimization test molecules and evaluation test molecules since this isn’t stric nested cross-validation, but it shouldn’t affect the final results too much.

I hope this helps!

Kyle

0reactions

swansonk14commented, Jul 17, 2020

@medchemistry I realized that the 0.555 numbers you’re quoting are from Table S3 in our supplementary materials, where we tested Chemprop on a very specific test set from the MoleculeNet paper. Our results using our best (ensemble) method with the data splits I described above are reported in tables S38 and S39, where we get 0.578 on random split and 0.968 on scaffold split.

@nanopoop We originally did the nested cross-validation procedure using a somewhat messy bash script (https://github.com/yangkevin2/lsc_experiments/blob/master/scripts/main2.sh) to run Chemprop on each of the folds that we manually extracted splits for. We hope to clean this up and rewrite it in pure Python code in Chemprop eventually.

Top Results From Across the Web

Understanding your Statement of Results | Cambridge English

Interpretation of results. Grade C covers the range of ability from a borderline pass to good achievement at the level.

ESOL - Florida Department of Education

Authorization for reproduction is hereby granted to the State System of Public Education consistent with Section 1006.39(2), Florida Statutes.

University of Cambridge ESOL Examinations Certificate of ...

Resulting corrections were made to the report to produce a second draft, circulated in advance of the Expert Panel meeting. Where additional clarification...

Manual for Language Test Development and Examining

When we report test results in terms of the CEFR, we are claiming to be able to interpret test performance in terms of...

Issues in Assessing English Language Learners ... - ERIC

Abstract. The No Child Left Behind (NCLB) Act has made a great impact on states' policies in assessing English language learner (ELL) students....