question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected randomness in graphconv prediction

See original GitHub issue

VERSION: 2.1.0

After training a GraphConvModel for a fair number of hours, I am getting some unexpected results when testing the performance of the model. In particular, the training, valid, and test errors are all different when I run the code below. This is surprising since I was careful to set the same seed during the train test split and have not changed the csv file containing the data between training and testing.

If it helps, my original model was constructed using the following code:

TRAINING CODE:

model = GraphConvModel(n_tasks=1, mode='regression',
                       tensorboard=True, model_dir='models/',
                       dropout=0.5, graph_conv_layers=[64,64,64])

Is it possible that dropout is being used in the prediction, leading to the stochastic behaviour? Otherwise, I cannot find anywhere else that randomness has been introduced, and this is quite frustrating since I did not save my datasets to disk, thinking that setting the seed would be sufficient. I have all of the model data, but I doubt that helps without knowing if I am snooping on my test data.

TESTING CODE:

import deepchem as dc
from deepchem.models.tensorgraph.models.graph_models import GraphConvModel

graph_featurizer = dc.feat.graph_features.ConvMolFeaturizer()
loader = dc.data.data_loader.CSVLoader( tasks=['unique_pka'],
                                         smiles_field="CANONICAL_SMILES",
                                        id_field="CMPD_CHEMBLID",
                                        featurizer=graph_featurizer )

dataset = loader.featurize( './unique_pkas.csv' )

splitter = dc.splits.RandomSplitter()
train_dataset, valid_dataset, test_dataset = splitter.train_valid_test_split(dataset, seed=42)

transformers = [
        dc.trans.NormalizationTransformer(transform_y=True, dataset=train_dataset)]

for dataset in [train_dataset, valid_dataset, test_dataset]:
        for transformer in transformers:
                    dataset = transformer.transform(dataset)

model = GraphConvModel.load_from_dir('models')
# model.restore()

train_scores = model.evaluate(
                train_dataset,
                [dc.metrics.Metric(dc.metrics.rms_score),
                 dc.metrics.Metric(dc.metrics.r2_score)]
                )
print('train scores')
print(train_scores)

valid_scores = model.evaluate(
                valid_dataset,
                [dc.metrics.Metric(dc.metrics.rms_score),
                 dc.metrics.Metric(dc.metrics.r2_score)]
                )
print('valid scores')
print(valid_scores)

test_scores = model.evaluate(
                test_dataset,
                [dc.metrics.Metric(dc.metrics.rms_score),
                 dc.metrics.Metric(dc.metrics.r2_score)]
                )
print('test_scores')
print(test_scores)

Thanks for your help!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
peastmancommented, Feb 11, 2019

You’re completely right. train_test_valid_split() was ignoring the seed instead of passing it on to the splitter. I just posted a pull request with the fix. Thanks for spotting this!

1reaction
peastmancommented, Feb 10, 2019

Ok, let me look into it and see if I can tell what’s going on.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using graph convolutional neural networks to learn a ...
Our trained random forest model achieved a predictive accuracy of 61.4% on a separate test set—worse than the random forest model trained on ......
Read more >
Spectral Graph Convolution Explained and Implemented Step ...
First, let's recall what is a graph. A graph G is a set of nodes (vertices) connected by directed/undirected edges. In this post,...
Read more >
Using graph convolutional neural networks to learn a ... - NCBI
To our surprise, we found that a smaller number of node types (meaning greater homogeneity in the glycan) predicted higher immunogenicity (Figures S3F...
Read more >
Graph Convolutional Networks | University of Amsterdam
GCN embedding (with random weights) for nodes in the karate club network. This might seem somewhat surprising. A recent paper on a model ......
Read more >
Spatiotemporal Virtual Graph Convolution Network for Key ...
Furthermore, it can help to establish passenger flow prediction and early warning ... noncritical OD pairs is scarce, and the travel demand is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found