Dataset split
See original GitHub issue❓ Questions & Help
Hi
So i’m currently trying to combine two very similar datasets.
I have them at two distinct locations and my goal would be to load both into the same dataset, using one as the training data and the other one as the test and val data.
Now when using an InMemoryDataset
, i thought about using masks for that.
But as far as i have found in the datasets provided, masks are used to use certain parts of graphs for training and other parts for testing (am i right, or am i misunderstanding them?)
Would there be a way, to declare certain graphs as part of the training data, and others as part of the test data?
Sorry if it’s a simple question, but i haven’t worked with masks before.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Splitting a dataset - Towards Data Science
A brief explanation of how to do train-test split of a dataset using sklearn · from sklearn.datasets import load_iris · iris = load_iris()...
Read more >Split Your Dataset With scikit-learn's train_test_split()
In it, you divide your dataset into k (often five or ten) subsets, or folds, of equal size and then perform the training...
Read more >What is data splitting and why is it important? - TechTarget
Data splitting is when data is divided into two or more subsets. Typically, with a two-part split, one part is used to evaluate...
Read more >Data Split Example | Machine Learning - Google Developers
Sampling and Splitting Data. Prepare to work with large datasets to solve machine learning problems. Updated Jul 18, 2022. Imbalanced Data.
Read more >Splitting a Dataset into Train and Test Sets - Baeldung
In this tutorial, we'll investigate how to split a dataset into training and test sets. Firstly, we'll try to understand why do we...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Just split them into
train
andtest
set and load them separaretly, e.g., like here.Yes, for whatever reason it crashes when i try to load it this way. But I just made a workaround. I initialize the encode that i’m using on both datasets beforehand and then pass it to both datasets as an argument, when creating them, so that both are using the same encoder instance.
It’s possible, that i have made a mistake, when coding the above section somewhere, but i cant find it and so i’ll just be moving on to this solution.