question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Roadmap] Remote Backend Support and Integration 🚀

See original GitHub issue

Motivation

PyG currently requires users to store graphs (and associated node + edge features) in Data and HeteroData objects, which are accepted by loaders to run forward/backward passes on an accelerator of choice. This abstraction, however, does not scale to large graphs (or large feature tensors), which can quickly oversubscribe CPU DRAM (despite the GPU VRAM requirements only being the memory consumption of each sampled subgraph and its associated node and edge features). Indeed, one can imagine storing graph features (and the graph itself) in “remote backends”, which provide fixed operators that can be used to integrate cleanly with downstream PyG samplers and loaders.

The goal of this roadmap is to track the integration of native remote backend support into PyG. At a high level, this will be accomplished through the feature store, graph store, and sampler abstractions into PyG. For more freeform discussion, please visit the #scalability channel in the PyG Slack community.

Implementation

Abstractions: FeatureStore, GraphStore, Sampler

  • Let Data and HeteroData implement the FeatureStore abstraction (#4807)
  • Define a GraphStore abstraction that is intended to hold an edge_index in memory (#4816)
  • Let Data and HeteroData implement the GraphStore abstraction (#4816)
  • Modify NeighborLoader to call FeatureStore and GraphStore methods instead of their Data/HeteroData counterparts. Note that this will require moving filtering of data into the feature store. The new interface will look like data: Union[Union[Data, HeteroData], Tuple[FeatureStore, GraphStore]] (#4817, #4883)
  • Implement BaseSampler and refactor existing samplers behind a common interface (#5312, #5365, #5402)
  • Introduce NodeLoader and LinkLoader, refactor existing loaders behind loader + sampler interface (#5404, #5418)
  • Support (optional) methods to obtain a TensorAttr or EdgeAttr from a FeatureStore/GraphStore from their first dataclass attribute, and refactor existing computations that get all (tensors, edges) and subsequently filter to use these methods.
  • Support variable samplers in LightningNodeData and LightningLinkData

Implementations

  • Implement a concrete FeatureStore, GraphStore, and Sampler with a popular backend to provide example usage. Some thoughts here include a Ray RandomAccessDataset for a feature store and a Neo4j graph for a graph store.
  • Implement a validation class that operates on Tuple[FeatureStore, GraphStore] to perform basic sanity checks (in a similar way that Data and HeteroData do today)
  • Implement sampling from edges in the HGTSampler
  • Implement (to the extent possible) samplers in torch_geometric/loader (e.g. GraphSAINT, ShaDow) behind the sampler interface, enabling (a) easy extension to sampling from edges and (b) ease of extension to reote backedns in the future.

Code Health

  • Implement a remote backend utility class to consolidate common methods across feature and graph stores (#5307)
  • Consolidate conditionals for Data, HeteroData, and Tuple[FeatureStore, GraphStore] throughout the PyG codebase into a single conditional. This should be possible since both Data and HeteroData are FeatureStore and GraphStores

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:4
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
mananshah99commented, Sep 20, 2022

Hi folks, this roadmap has been updated a bit to describe latest changes and a few potential further directions (cc @Padarn, I hope this helps address some of your questions as well). Feel free to add on, or let me know if you have any questions/comments/concerns!

2reactions
rusty1scommented, Jun 19, 2022

Yes, @wsad1. I think these are good points. One thing we could do to showcase is to have a short example/tutorial on how to connect to a neo4j graph database or similar.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Remote Job Roadmap - Rohit Ghumare
I have prepared the above Roadmap to help you get your remote job to succeed ... Deploying Infrastructure(FrontEnd + BackEnd) on AWS using...
Read more >
Ultimate RoadMap to Become Full Stack Developer in 2023
Ultimate RoadMap to Become Full Stack Developer in 2023 ... These tools can help you quickly and efficiently build backend applications.
Read more >
Backend Type: remote | Terraform - HashiCorp Developer
Terraform can store the state and run operations remotely, making it easier to version and work with in a team.
Read more >
Complete Web3 Developer Roadmap - 2022
If you want to work with the blockchain and help create the foundations that makeup Web3, then there's no skipping mastering back-end development....
Read more >
Roadmap - Argo CD - Declarative GitOps CD for Kubernetes
Sharding application controller¶. Application controller to scale automatically to provide high availability#8340. Add support for secrets in Application ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found