some thoughts about pyg
See original GitHub issueđ Feature
0. More comments to encourage us DIY.
1. torch_geometric.datasets.TUDataset
âs âonce and for allâ
2. Still about torch_geometric.datasets
: arrangement.
3. torch_geometric.contrib
(or, pyg_contrib
)
4. torch_geometric.io
(I have mentioned it)
5. functional
support
6. torch_geometric.visualization
Motivation
I have some thoughts about PyTorch Geometric
, I write down all my thoughts about pyg here. Perhaps some of the features is not needed, but I thought that . I like(love) the library, and that is the only reason for I write the long feature request.
Perhaps it can be a roadmap of pyg.
1. torch_geometric.datasets.TUDataset
âs âonce and for allâ
First, many thanks to the share of the datasets!
I marked All Data Sets. Downloading one-by-one is really takes a long time. With enough hard-disk compacity, why not do that once and for all?
one-click update TUDatasets
- check the datasets downloaded locally.
- compare with the siteâs datasets
- download and extract the rest.
2. Still about torch_geometric.datasets
: arrangement.
Geometric is really a big concept: any graphs can be okay: Citation Graphs(Cora
), Molecules(QM9
), Point Clouds(ModelNet
), even Knowledge Graphs(```DBP15K``)âŚ
Now, with only torch_geometric.datasets.DBP15K
, a green horn(just like me) cannot know what it is. So, IN MY OPINION, I think it might be better to distinguish the datasets, with different usage.
For example, ModelNet
can be represented as: torch_geometric.datasets.pointcloud.ModelNet
and so on.
Appendix: comparison about torchvision.datasets
As the official extension of pytorch
, torchvision
can be a reference of our repo.
Since torchvision
is focusing on problems on images, and the datasets is really well-known to nearly all people who is involved in Deep Learning, then torchvision.datasets
do not extinguish the datasets. (for example, even MNIST
is [1, 28, 28] and CIFAR10
is [3, 32, 32], with different number of channels. (Here, I use \[C, H, W\]
to represent the shape.
3. torch_geometric.contrib
(or, pyg_contrib
)
As we can see, feature requeset
is really a hard thing. Sometimes, the requesters do have the ability to add it. however, (perhaps at most time, i think), we just mention it.
Whatâs more, new ideas can be infinity, and we cannot push all the ideas and their implementations into master
branch.
So⌠Why not have a contrib
, like TensorFlow
.
what i think about contrib
for example, graph densenet mentioned in DeepGCNs: Can GCNs Go as Deep as CNNs? is really a good idea in pointnet segmentation. And the author opened the code (PyTorch Geometric implementation) in GitHub.
Here, I think a general steps of using pyg_contrib: (take his repo(code) for example):
graph densenet
- his github repo(code) -> pyg_contrib (or, feature request: prototype code -> pyg_contrib), -> denotes
push
- discussed and modified (to make it much better) in pyg_contrib, by EVERYONE WHO WANTS TO INVOLVED WITH IT. Of course, a roadmap, or, a kanban is really needed here. (kanban is provided by github)
- if it is really good, or, really needs to be maintained , add it to pyg; if not, remove(deprecate) it from pyg_contrib.
(added in 2019.09.25)
pyg_contrib.datasets
wiki dataset, and linqs dataset
wiki dataset linqs dataset (datasets provided by LINQS group) https://linqs.soe.ucsc.edu/data and there are some datasets about social relationships. I think this can be a good example to contrib.
conclusion of pyg_contrib
As mentioned before, new thoughts can be infinity. And contrib
can never include all datasets. What PyG
can do is to set a standard
, giving some examples, and implement some of the frequently-used algorithms (for example, GCN).
datasets written in tutorial only has the base_class
âs code, without an implementation, or, an example of âhow to DIYâ.
externel resources provided by Steeve Huang is a good PyG tutorial, but⌠I just feel that only with 2 jupyter notebooks of just âusingâ PyG (as mentioned in his readme.md) perhaps⌠(And of course, device also counts:
DL on graphs can be a little easier, compared with DL on Images. 2-layersâGCN
network can run relatively-fast on node classification on Cora Dataset, only with an Intel Core i7-3540M. With Intel Core i7-8700, Core i7-8750M, and with GPU, it can be much much faster. (Point Cloud mission do need GPUâŚ) I think that most of the code in tutorial can be run on CPU (fast).
4. torch_geometric.io
(I have mentioned it)
I have mentioned that. read and write the files (especially point cloud files, .ply
, .off
files)
5. functional
support, i.e. torch_geometric.nn.functional
mentioned in a previous issue.
we can use functional to create(or, to test) nearly all kinds of structures. (most time, for fun).
for example, initialization
can be tested. (although as we all know, kaiming_uniform
can be a good choice when the input is an image, butâŚ), and I know that reset_parameters
can be a solution when the parameters needs to be modified. but i do not think it is that convinent. If a weight
is assigned, and just use x
, edge_index
and weight
to compute, like that in torch.nn.functional.conv2d
, it can be really a nice thing.
6. torch_geometric.visualization
visualization is really a big job. NOT ONLY the curves, t-SNE, ⌠GRAPH itself should be considered. A colormap can show us the importance of each node. (color the node with colormap, just like heatmap in image(feature map)), why not in visdom
? ( I know that matplotlib
âs plots can be viewed in visdom
, and we can use networkx.draw()
to plot a graph, so⌠it might be possible to use visdom
(I do not do deep research and test, just show the possibility of using visdom
)
example and code. example:
code:
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import visdom as vis
g = nx.karate_club_graph()
fig = plt.figure()
nx.draw_circular(g, with_labels=True, node_color='#66CCFF') # NOTE(wmf): you can write anything you like.
vis_env = vis.Visdom()
vis_env.matplot(fig) # sorry, only this works...
What? TensorBoard
? I think that Tensorboard
is not that suitable for visualizing GRAPHs, although visualizing curves, and t-SNE is really really cool in TensorBoard
.
Additional context
No. (If I think of something more, I will go on with the issue)
Yours Sincerely, MingFei Wang. (@wmf1997) 2019.09.16 22:11 (UTC+8) Tianjin, China
Added in 2019.09.17 11:30 (UTC+8):
0. More comments to encourage us DIY.
First, Thank you for your work again~! (PyG is a good architecture of Graph Representation Learning~!)
Reading source code can also be a good way of studying~ I mean, reading the implementions of Graph Neural Networks, for example, read MessagePassing
?Abstract? Base Class can let me know what message passing is in GNN, and GCNConv
can let me know the derived class (implementations in detail) of GNN.
However, IN MY OPINION, codes without enough comments might make people confused (after they read the article). (For example, GCNConv
, in authorsâ(kipf & welling) origin pytorch implemention, uses sparse matrix multiplication, (as the formula written in the article. however, in pyg
, your implementation uses MessagePassing
. and I know the reason from rrl_gnn.pdf. the reason, i.e. How to change sparse matmul into message passing, should be written. With this method, I think more methods can be implemented or re-implemented by pyg. )
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:14 (10 by maintainers)
Top GitHub Comments
Thank you, this is an awesome list. We can discuss this in more detail after ICLR deadline đ
@rusty1s @WMF1997 Regarding pyg.io ,
Can we add some more documentation (examples? tests / sample files?) to it?
There are a lot of different file formats out there, I donât think itâs reasonable to support all of them. I understand that Data() objects are the way to go, but perhaps we can define a file format for âpyg graphsâ (it needs to be general enough, yet flexible and compressed)? If we have a unified file format interface, it will simplify the reading and writing and parsing (just move the âpainâ to creating those files in the first place). But since every dataset need to be saved somehow, somewhere, it means only 1 person needs to do the dataset conversion and upload it online.
As an example: For my current dataset, I define 3 CSV files (for node features, edge index and edge features), as well as collect some metadata for each new graph. I think it is general enough to capture all types of graphs. I donât know if it is âcompressedâ enough. Maybe it needs to allow using only numbers (remove string features using some encoding before saving it).