question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading custom dataset and building the graph in batches

See original GitHub issue

Hello!

I have my own dataset and I want to use this library to train GNNs. My dataset is in TFRecords format and read using tf.data.Dataset as_numpy_iterator(), where 1 batch of iterator represents 1 graph. How can I feed in this data to this library? I want to construct 1 huge train graph (which consists of multiple graphs from the numpy iterator) and 1 huge test graph that follows PPI Dataset format. I guess this implies that the train and test graph object generation will be done in batches. Is that possible? Or if the resulted graph object or matrices are too large, how can I train in minibatches? Thanks

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
megstanleycommented, Oct 6, 2021

The GraphDataset object assembles batches composed of a large single graph (which may be disconnected, i.e. composed of many smaller graphs). The batching assembles the largest graph possible from component graphs according to the dataset parameter “max_nodes_per_batch”, which can be set according to need and memory limitations. To read from a numpy iterator, making a graph sample iterator that consumes your data and reassembles in to the graph sample form (as seen in JsonLGraphDataset example) may be the solution. The iterator would then be used by the graph batching method, and thus supplied to the get_tensorflow_dataset method of the parent GraphDataset with the correct batching implemented. Is this what you intended?

0reactions
megstanleycommented, Oct 7, 2021

Following how a dataset and model is instantiated from this point may indicate how to use: https://github.com/microsoft/tf2-gnn/blob/master/tf2_gnn/cli/train.py

The JsonLGraphDataset is a specific example of use, the change required to use this with a different input format involves custom load_data and _graph_iterator methods (see base class GraphDataset). Another example of use is https://github.com/microsoft/tf2-gnn/blob/master/tf2_gnn/data/jsonl_graph_property_dataset.py

As an alternative to making a custom dataloader, one could place data in the JsonLines format specified in the repo README and directly use JsonLGraphDataset or JsonLGraphPropertyDataset. Both loaders implement batching with maximum number of graph nodes to be used per batch to be chosen by the user.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Building Efficient Custom Datasets in PyTorch
In general, the loader will try stacking batches of 1-dimensional tensors into 2-dimensional tensors, batches of 2-dimensional tensors into 3- ...
Read more >
How To Load Your Own Custom Dataset in Google Colab with ...
In this Neural Networks and Deep Learning Tutorial, we will talk about How To Load Your Own Custom Dataset in Google Colab with...
Read more >
Creating Dataloader for Custom Dataset | Pytorch Geometric
In this video I show a simple method for creating dataloader for custom graph data in pytorch geometric!
Read more >
Advanced Mini-Batching — pytorch_geometric documentation
The creation of mini-batching is crucial for letting the training of a deep learning model scale to huge amounts of data. Instead of...
Read more >
Deep Learning basics with Python, TensorFlow and Keras p.2
Unzip the dataset, and you should find that it creates a directory called PetImages . Inside of that, we have Cat and Dog...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found