[Roadmap] GraphGym via PyTorch Lightning and Hydra ๐
See original GitHub issue๐ The feature, motivation and pitch
The overall goal of this roadmap is to ensure a tighter connection between PyG core and the GraphGym configuration manager. Furthermore, an additional goal is to not re-invent the wheel in GraphGym and make use of popular open-source frameworks whenever applicable, e.g., for configuration managament, training, logging, and autoML.
As such, this roadmap structures itself into different components such as general improvements (e.g., tighter connection between PyG and GraphGym), PyTorch Lightning integration, and Hydra integration as our configuration tool.
General Roadmap
- Add
register
functionality to models in PyG core - Remove any layer/model definition of GraphGym and move it to PyG core
- Expose a
graphgym
bash script in abin/
folder - GraphGym usage should not require manually cloning of PyG - Better and more user-friendly documentation
- Adding
HeteroData
support - Adding pooling layers
- โฆ
PyTorch Lightning Integration
GraphGym training experience can be improved for scalability, mixed precision support, logging and checkpoints with PyTorch Lightning integration.
- Integrate a
LightningModule
into GraphGym - Update train method with PL
Trainer
and theLightningModule
implementations - Refactor
load_ckpt
andsave_ckpt
with PL checkpoint save and load method - Integrate
LightningDataset
,LightningNodeData
andLightningLinkData
modules - โฆ
Hydra Integration
Users of PyG should be able to write GraphGym configurations by being able to make full use of PyG functionality. In particular, we want to allow access to any dataset, any data transformation pipeline, and any GNN layer/model. For this, we need to follow a structured/composable configuration, e.g., as introduced here
defaults:
- dataset: KarateClub
- transform@dataset.transform:
- NormalizeFeatures
- AddSelfLoops
- model: GCN
- optimizer: Adam
- lr_scheduler: ReduceLROnPlateau
- _self_
model:
in_channels: 34
out_channels: 4
hidden_channels: 16
num_layers: 2
- Use variable interpolation, e.g.,
model.in_channels = ${dataset.num_features}
andmodel.out_channels = ${dataset.num_classes}
- โฆ
Weights & Biases Integration (TBD)
- โฆ
AutoML (TBD)
- โฆ
Issue Analytics
- State:
- Created a year ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
Iโll spend sometime going through the links you shared and start a draft PR regarding this. Hope to get your guidance on it as well ๐.
This is amazing. We should collect some information about how we want to integrate Hydra into GraphGym, as I believe we need a new config layout. I have started something a long time ago but did not finish it, see here, here and here. Would very much appreciate some advice and insights from you!