Allow HeteroGraphConv to be applied on a subset of the etypes in a graph.
See original GitHub issue🐛 Bug
Right now, applying HeteroGraphConv to a graph while only having created modules for a subset of the graphs etypes causes a KeyError to be raised. For the torch model the corresponding lines are here: https://github.com/dmlc/dgl/blob/5be937a7fbfca0db08a1507744906d61e47a340b/python/dgl/nn/pytorch/hetero.py#L176 and line 189 below. To me that seems like an unnecessary restriction, making experimenting with heterogeneous graphs unnecessarily complicated if you want to ignore certain edges for whatever reason.
You can work around it by calling:
graph.remove_edges(graph.edge_ids(*graph.edges(etype="ignore_me"), etype="ignore_me"), etype="ignore_me")
~If you attempt this workaround please not that remove_edges
is a bit of a troublemaker on batched graphs as of writing this. See: https://github.com/dmlc/dgl/issues/2310#issuecomment-858456712~
To Reproduce
Steps to reproduce the behavior:
- Create a heterograph with at least two etypes.
- Create a HeteroGraphConv module with modules for only a subset of the etypes.
- Try to apply the module -> 💥
Expected behavior
As far as I understand the HeteroGraphConv should just ignore the edges for which no module exists. Speaking in terms of code, adding:
if etype not in self.mods:
continue
below this line and this line should fix it in the torch implementation and the code for the other implementations looks reasonably similar.
Environment
- DGL Version (e.g., 1.0): 0.6.1
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): Any
- OS (e.g., Linux): Linux
- How you installed DGL (
conda
,pip
, source): pip - Python version: 3.8
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (8 by maintainers)
@NiklasBeierl Thank you for your detailed response! That clarifies my question about the problems with your fix. Regarding my troubles with
edge_type_subgraph()
, the solution you proposed works if I am feeding the entire graph into the GNN layers.However, if I want to train the GNN with mini-batches using
EdgeDataLoader
(https://docs.dgl.ai/en/0.6.x/api/python/dgl.dataloading.html#dgl.dataloading.pytorch.EdgeDataLoader), your solution would not work. Note thatEdgeDataLoader
returns an iterator ofinput_nodes, pos_pair_graph, neg_pair_graph, blocks
. Forpos_pair_graph
, it should include type A edges because they’re the links I want to predict. But forblocks
(the message flow graphs (MFGs)), it should NOT include type A edges because I want to exclude them from message passaging. Note that it’s not possible to applyedge_type_subgraph()
on MFGs, so there’s no way I can get rid of type A edges fromblocks
when feeding them into the GNN.Hey @yunshiuan,
I think the edge case that @jermainewang is describing there is this: Imagine a node in the graph is only connected to the rest of the graph with an edge of one specific type. Here node
3
only has one incoming edge, which is of typeB
.Now, if we ignore the
B
edges, Node3
becomes a Node with degree 0. (It has no neighbors). If you now take a look at many of the Graph convolution modules indgl.nn
you will see that this is a problem for a lot of them. A search forzero_in_degree
on this page will give you a good impression. It usually ends up being a divide-by-0 situation.c_ij
in the GraphConv formula is a good example.Now to your troubles with
edge_type_subgraph()
: I am not sure what you mean by the edge you want to predict being “incorrectly” removed. You mean that you need to keep it to later calculate your loss? I haven’t done any link prediction yet. But shouldn’t you be able to feed a copy of your graph without typeA
edges (obtained fromedge_type_subgraph()
) into the module while keeping your original graph stored in another variable to compare?Now perhaps you still need to deal with 0-degree nodes. (
edge_type_subgraph
does not take care of that) You can allow them if your GConv module has aallow_zero_in_degree
argument, or remove them like so.