question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

some thoughts about pyg

See original GitHub issue

🚀 Feature

0. More comments to encourage us DIY.

1. torch_geometric.datasets.TUDataset’s “once and for all”

2. Still about torch_geometric.datasets: arrangement.

3. torch_geometric.contrib (or, pyg_contrib)

4. torch_geometric.io (I have mentioned it)

5. functional support

6. torch_geometric.visualization

Motivation

I have some thoughts about PyTorch Geometric, I write down all my thoughts about pyg here. Perhaps some of the features is not needed, but I thought that . I like(love) the library, and that is the only reason for I write the long feature request.
Perhaps it can be a roadmap of pyg.

1. torch_geometric.datasets.TUDataset’s “once and for all”

First, many thanks to the share of the datasets! image

I marked All Data Sets. Downloading one-by-one is really takes a long time. With enough hard-disk compacity, why not do that once and for all?

one-click update TUDatasets

  1. check the datasets downloaded locally.
  2. compare with the site’s datasets
  3. download and extract the rest.

2. Still about torch_geometric.datasets: arrangement.

Geometric is really a big concept: any graphs can be okay: Citation Graphs(Cora), Molecules(QM9), Point Clouds(ModelNet), even Knowledge Graphs(```DBP15K``)…

Now, with only torch_geometric.datasets.DBP15K, a green horn(just like me) cannot know what it is. So, IN MY OPINION, I think it might be better to distinguish the datasets, with different usage. For example, ModelNet can be represented as: torch_geometric.datasets.pointcloud.ModelNet and so on.

Appendix: comparison about torchvision.datasets

As the official extension of pytorch, torchvision can be a reference of our repo. Since torchvision is focusing on problems on images, and the datasets is really well-known to nearly all people who is involved in Deep Learning, then torchvision.datasets do not extinguish the datasets. (for example, even MNIST is [1, 28, 28] and CIFAR10 is [3, 32, 32], with different number of channels. (Here, I use \[C, H, W\] to represent the shape.

3. torch_geometric.contrib (or, pyg_contrib)

As we can see, feature requeset is really a hard thing. Sometimes, the requesters do have the ability to add it. however, (perhaps at most time, i think), we just mention it.
What’s more, new ideas can be infinity, and we cannot push all the ideas and their implementations into master branch. So… Why not have a contrib, like TensorFlow.

what i think about contrib

for example, graph densenet mentioned in DeepGCNs: Can GCNs Go as Deep as CNNs? is really a good idea in pointnet segmentation. And the author opened the code (PyTorch Geometric implementation) in GitHub.

Here, I think a general steps of using pyg_contrib: (take his repo(code) for example):

graph densenet
  1. his github repo(code) -> pyg_contrib (or, feature request: prototype code -> pyg_contrib), -> denotes push
  2. discussed and modified (to make it much better) in pyg_contrib, by EVERYONE WHO WANTS TO INVOLVED WITH IT. Of course, a roadmap, or, a kanban is really needed here. (kanban is provided by github)
  3. if it is really good, or, really needs to be maintained , add it to pyg; if not, remove(deprecate) it from pyg_contrib.

(added in 2019.09.25)

pyg_contrib.datasets wiki dataset, and linqs dataset

wiki dataset linqs dataset (datasets provided by LINQS group) https://linqs.soe.ucsc.edu/data and there are some datasets about social relationships. I think this can be a good example to contrib.

conclusion of pyg_contrib

As mentioned before, new thoughts can be infinity. And contrib can never include all datasets. What PyG can do is to set a standard, giving some examples, and implement some of the frequently-used algorithms (for example, GCN).

datasets written in tutorial only has the base_class’s code, without an implementation, or, an example of “how to DIY”.

externel resources provided by Steeve Huang is a good PyG tutorial, but… I just feel that only with 2 jupyter notebooks of just “using” PyG (as mentioned in his readme.md) perhaps… (And of course, device also counts: DL on graphs can be a little easier, compared with DL on Images. 2-layers’GCN network can run relatively-fast on node classification on Cora Dataset, only with an Intel Core i7-3540M. With Intel Core i7-8700, Core i7-8750M, and with GPU, it can be much much faster. (Point Cloud mission do need GPU…) I think that most of the code in tutorial can be run on CPU (fast).

4. torch_geometric.io (I have mentioned it)

I have mentioned that. read and write the files (especially point cloud files, .ply, .off files)

5. functional support, i.e. torch_geometric.nn.functional

mentioned in a previous issue. we can use functional to create(or, to test) nearly all kinds of structures. (most time, for fun).
for example, initialization can be tested. (although as we all know, kaiming_uniform can be a good choice when the input is an image, but…), and I know that reset_parameters can be a solution when the parameters needs to be modified. but i do not think it is that convinent. If a weight is assigned, and just use x, edge_index and weight to compute, like that in torch.nn.functional.conv2d, it can be really a nice thing.

6. torch_geometric.visualization

visualization is really a big job. NOT ONLY the curves, t-SNE, … GRAPH itself should be considered. A colormap can show us the importance of each node. (color the node with colormap, just like heatmap in image(feature map)), why not in visdom? ( I know that matplotlib’s plots can be viewed in visdom, and we can use networkx.draw() to plot a graph, so… it might be possible to use visdom (I do not do deep research and test, just show the possibility of using visdom)

example and code. example: image

code:

import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import visdom as vis

g = nx.karate_club_graph()
fig = plt.figure()
nx.draw_circular(g, with_labels=True, node_color='#66CCFF')  # NOTE(wmf): you can write anything you like.  

vis_env = vis.Visdom()
vis_env.matplot(fig)  # sorry, only this works... 

What? TensorBoard? I think that Tensorboard is not that suitable for visualizing GRAPHs, although visualizing curves, and t-SNE is really really cool in TensorBoard.

Additional context

No. (If I think of something more, I will go on with the issue)

Yours Sincerely, MingFei Wang. (@wmf1997) 2019.09.16 22:11 (UTC+8) Tianjin, China

Added in 2019.09.17 11:30 (UTC+8):

0. More comments to encourage us DIY.

First, Thank you for your work again~! (PyG is a good architecture of Graph Representation Learning~!)

Reading source code can also be a good way of studying~ I mean, reading the implementions of Graph Neural Networks, for example, read MessagePassing ?Abstract? Base Class can let me know what message passing is in GNN, and GCNConv can let me know the derived class (implementations in detail) of GNN.
However, IN MY OPINION, codes without enough comments might make people confused (after they read the article). (For example, GCNConv, in authors’(kipf & welling) origin pytorch implemention, uses sparse matrix multiplication, (as the formula written in the article. however, in pyg, your implementation uses MessagePassing. and I know the reason from rrl_gnn.pdf. the reason, i.e. How to change sparse matmul into message passing, should be written. With this method, I think more methods can be implemented or re-implemented by pyg. )

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:3
  • Comments:14 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
rusty1scommented, Sep 16, 2019

Thank you, this is an awesome list. We can discuss this in more detail after ICLR deadline 😃

0reactions
Hafplocommented, Oct 28, 2020

@rusty1s @WMF1997 Regarding pyg.io ,

  1. Can we add some more documentation (examples? tests / sample files?) to it?

  2. There are a lot of different file formats out there, I don’t think it’s reasonable to support all of them. I understand that Data() objects are the way to go, but perhaps we can define a file format for “pyg graphs” (it needs to be general enough, yet flexible and compressed)? If we have a unified file format interface, it will simplify the reading and writing and parsing (just move the ‘pain’ to creating those files in the first place). But since every dataset need to be saved somehow, somewhere, it means only 1 person needs to do the dataset conversion and upload it online.

As an example: For my current dataset, I define 3 CSV files (for node features, edge index and edge features), as well as collect some metadata for each new graph. I think it is general enough to capture all types of graphs. I don’t know if it is ‘compressed’ enough. Maybe it needs to allow using only numbers (remove string features using some encoding before saving it).

Read more comments on GitHub >

github_iconTop Results From Across the Web

some thoughts about pyg ¡ Issue #684 - GitHub
I have some thoughts about PyTorch Geometric , I write down all my thoughts about pyg here. Perhaps some of the features is...
Read more >
What are your thoughts on Professor Pyg? : r/BatmanArkham
Only problem is that Pyg is one of Batman's weaker foes, so he couldn't be the villain for an entire movie. He'd be...
Read more >
Professor Pyg | Arkham Wiki - Fandom
Pyg appears to have a God complex, believing he has the right to create anything he wants from whatever he can get his...
Read more >
Five Thoughts on Batwoman's “A Lesson From Professor Pyg”
In any case, “A Lesson From Professor Pyg” puts Ryan and Sophie into the lion's den. The real problem is that somehow Marquis...
Read more >
A Horrifying Batman Villain Has Created Their Own Arkham ...
The reason why Pyg is now interested in running his own asylum in comics is cited as being an additional aspect of his...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found