Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

slices from collate_fn returns unexpected result

See original GitHub issue

I am trying to create a custom dataset. My dataset.data is Data(edge_index=[2, 93003], edge_to_id=[1], test_mask=[93003], train_mask=[93003], val_mask=[93003], vertice_to_id=[1], y=[93003]). The number in square brace is shape

However, dataset.slice is {'edge_index': tensor([0, 93003]), 'y': tensor([0, 93003]), 'train_mask': tensor([0, 93003]), 'val_mask': tensor([0, 93003]), 'test_mask': tensor([0, 93003])}. The number in ([...]) is actual value. Therefore, I cannot get an item with index larger than 2, while I have 93003 datapoint.

I have seen that slices typically has a longer length. What did I do wrong? My code of creating the dataset is

edge_index = torch.LongTensor(edge_index)
data = Data(edge_index=edge_index.t().contiguous())
data.y = torch.tensor(edge_type)
data.train_mask = torch.cat((torch.ones(n_train, dtype=torch.bool), torch.zeros(n_val+n_test, dtype=torch.bool)))
data.val_mask = torch.cat((torch.zeros(n_train, dtype=torch.bool), torch.ones(n_val, dtype=torch.bool), torch.zeros(n_test, dtype=torch.bool)))
data.test_mask = torch.cat((torch.zeros(n_train+n_val, dtype=torch.bool), torch.ones(n_test, dtype=torch.bool)))
collated_data, slices = self.collate([data])
torch.save((collated_data, slices), *self.processed_paths)

Issue Analytics

State:
Created 3 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

minhtrietcommented, Dec 9, 2020

It is done

0reactions

minhtrietcommented, Dec 9, 2020

In addition, it has been noted that the WN18 andFB15k datasets suffer from test set leakage, due to inverse relations from the training set being present in the test set – however, the extent of this issue has so far not been quantified.

… but enough for them to create a new dataset. Source