reproducibility issue of DGL
See original GitHub issue🐛 Bug
I used the dgl to utilize GAT-like network. And I fixed the seed of python, numpy, pytorch and dgl for reproducibility. However, the results are still not deterministic and the varied range is very large. Detailedly, I used the following code for fixing seed:
def set_seeds(seed):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
dgl.seed(seed)
To Reproduce
My GAT-like networks are like:
class GATLayer(nn.Module):
def __init__(self, hidden_size, alpha, beta, gamma=0.2, dropout=0.6):
super().__init__()
self.gamma = gamma
self.alpha = alpha
self.beta = beta
self.hidden_size = hidden_size
self.W_fc = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
self.attn_fc = nn.Linear(2 * hidden_size, 1, bias=False)
self.leakyrelu = nn.LeakyReLU(self.gamma)
def edge_attention(self, edges):
z2 = torch.cat([edges.src['emb_attn'], edges.dst['emb_attn']], dim=1) # N x 2h
a = self.attn_fc(z2) # N x 1
return {'e': self.leakyrelu(a)} # N x 1
def message_func(self, edges):
# message UDF for equation (3) & (4)
return {'z': edges.src['emb_crf'], 'e': edges.data['e']}
def reduce_func(self, nodes):
alpha = torch.softmax(nodes.mailbox['e'], dim=1) # N x 1
# equation (4)
h = torch.sum(alpha * nodes.mailbox['z'], dim=1) # N x h -> 1 x h
return {'h': h}
def forward(self, embedding_input, h_input, graph):
dv = 'cuda' if embedding_input.is_cuda else 'cpu'
z = self.W_fc(h_input)
graph.ndata['emb_crf'] = h_input
graph.ndata['emb_attn'] = z
graph.apply_edges(self.edge_attention)
graph.update_all(self.message_func, self.reduce_func)
gat_output = graph.ndata.pop('h')
output = (self.alpha * embedding_input + self.beta * gat_output) / (self.alpha + self.beta)
return output
Expected behavior
Environment
- DGL Version (e.g., 1.0): 0.6.x
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):Pytorch 1.9.0
- OS (e.g., Linux): Linux
- How you installed DGL (
conda
,pip
, source): pip - Build command you used (if compiling from source):
- Python version: 3.7.9
- CUDA/cuDNN version (if applicable): 10.2
- GPU models and configuration (e.g. V100): P40
- Any other relevant information:
Additional context
Issue Analytics
- State:
- Created 2 years ago
- Comments:31 (4 by maintainers)
Top Results From Across the Web
Reproducibility issue - Questions - Deep Graph Library
When running the common GCN model using DGL, I met the reproducibility issue, i.e. even I have tried my best to set seed...
Read more >Reproducibility of the results for GNN using DGL grahSAGE
I'm working on a node classification problem using graphSAGE. I'm new to GNN so my code is based on the tutorials of GraphSAGE...
Read more >Gastroprotective and gastric motility benefits of AD-lico ... - NCBI
The aim of this study was to evaluate in vivo both the anti-Helicobacter and the gastric-relaxing effects of AD-lico/Healthy Gut™ in rat models....
Read more >An approach for implementing and deploying Graph Deep ...
Where a flask app is serving the GraphSAGE PyTorch model built on the DGL library. Neptune Connection Issue+Kubernetes Probes Solution: It is the...
Read more >2.4. PNA:DNA and DGL:DNA heteroduplex formation and DCL ...
If you have any questions about the protocol or need a more detailed version, post your question or submit your request for a...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
If you absolutely want to remove the non-determinism in neighbor sampling, you could try setting
num_workers=1
(which disables OpenMP in neighbor sampling since the sampling happens in subprocesses, but only in DGL 0.8+), or setting the environment variableOMP_NUM_THREADS=1
.@BarclayII Thx! It’s done!!! Setting
num_workers=1
works!OMP_NUM_THREADS=1
does not seem to work. Anyway, my problem was finally solved and I learned a lot from you guys!