question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Understanding multiclass on Chemprop

See original GitHub issue

I have a CSV file that looks like the example below:

CAN_SMILES class1 class2 class 3
SMILES1 0 0 1
SMILES2 0 0 1
SMILES3 0 0 1

I then trained the NN with:

chemprop_train --data_path data.csv --dataset_type multiclass \ 
--save_dir 00_all_datapoints_dedup_random_split.train.accuracy \
--metric accuracy --split_type random --split_sizes 0.8 0.2 0.0  \
--gpu 1 --dropout 0 --ensemble_size 1 --num_folds 1 --hidden_size 300 \
--ffn_hidden_size 300 --smiles_column CAN_SMILES  \
--target_columns class1 class2 class3  \
--multiclass_num_classes 3

What I was expecting from the predictions was something like [P(class1), P(class2), P(class3], and summing up those three probabilities equals 1. However, I am getting the following:

CAN_SMILES class1 class2 class 3 class1_class_0 class1_class_1 class1_class_2
SMILES1 0 0 1 [0.4089986979961395, 0.3245898485183716, 0.2664114236831665] [0.4382680356502533, 0.18726319074630737, 0.3744688332080841] [0.37828463315963745, 0.417496919631958, 0.20421843230724335]
SMILES2 0 0 1 [0.4023689031600952, 0.3176715075969696, 0.27995961904525757] [0.42649418115615845, 0.20301619172096252, 0.3704896569252014] [0.365766704082489, 0.4299478232860565, 0.20428545773029327]
SMILES3 0 0 1 [0.4001476764678955, 0.3139444887638092, 0.28590789437294006] [0.4191807508468628, 0.2087540626525879, 0.3720651865005493] [0.3638008236885071, 0.41967928409576416, 0.21651984751224518]

I am completely lost with these results. As I mentioned above, I was expecting a single output vector with the probabilities for each class. Also, I don’t understand the labeling class1_class_0 – what is it referring to? I inspected the code and found;

https://github.com/chemprop/chemprop/blob/2ae05928f386fcf3306ce2491a8fc6a1f03655ec/chemprop/train/make_predictions.py#L129-L133

But it is not clear to me yet. I hope someone could help me to understand this. As you see, I have 3 columns with three different classes, and I want Chemprop to predict the probability of each one in a vector [P(class1), P(class2), P(class3)].

Note: I also tried saving to the CSV file the SMILES, [class1, class2, class3] but that seems to not being parsed by Chemprop.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
hesthercommented, May 20, 2021

Dear @muammar, this looks great. Happy I could help!

1reaction
hesthercommented, May 19, 2021

Hey @muammar, if the classes are mutually exclusive (what they seem to be), you need to reformat your CSV into something like:

CAN_SMILES class
SMILES1 2
SMILES2 2
SMILES3 2

so, class1 = 0, class2 = 1 and class3 = 2

Read more comments on GitHub >

github_iconTop Results From Across the Web

chemprop/chemprop - Molecular Property Prediction - GitHub
Classification. Targets are binary (i.e. 0s and 1s) indicators of the classification. Multiclass. Targets are integers (starting with zero) indicating which ...
Read more >
Source code for chemprop.models.model
:param args: A :class:`~chemprop.args.TrainArgs` object containing model arguments. """ self.multiclass = args.dataset_type == 'multiclass' if ...
Read more >
Chemical Predictions with 3 lines of code | by Mathias Gruber
Tutorial for Machine Learning / AI to predict the properties of small molecules (a task known as QSAR) using state-of-the-art graph neural ...
Read more >
Molecular Property Prediction Using Machine Learning - Nurix
This is a variant of commonly used molecular property prediction ML model called Chemprop. • Represents a molecule as a graph made up...
Read more >
Multi-Class Classification? Yes.. Let's discuss what is it! |
This article is actually a continuum of a series that focuses on the basic understanding of the building blocks of Deep Learning.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found