Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_tensor and pil_to_tensor inconsistency

See original GitHub issue

to_tensor uses from_numpy which never copies and always returns a CPU tensor. But pil_to_tensor uses as_tensor, which would return a CUDA tensor if the default tensor type is changed by the user (e.g., https://github.com/pytorch/pytorch/issues/39088). I think a discussion on what the correct behavior is is needed, followed with a patch to make the two functions consistent.

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

ssnlcommented, May 30, 2020

It’s been in pytorch since a long time ago. Probably before 1.0 if I remember correctly. As long as you return a CUDA tensor from your dataset code and use proper start method, the main process will receive a CUDA tensor, regardless of num_workers.

It is always not advertised nor recommended because it makes CUDA memory management difficult, and that moving to GPU really fast and not usually the bottleneck as long as you use pin_memroy.

This issue is not about whether pytorch supports loading CUDA tensors. It is about torchvision, a library built on top of pytorch, should return CUDA tensor or not in ToTensor that is called in the dataset code. Any dataset code that yields CUDA tensors has always been working fine.

0reactions

ain-sophcommented, May 30, 2020

Loading to GPU is and has always been supported. It’s not a new feature.

I’m not quite sure about that? Is their any way to use DataLoader to load data to GPU? Currently, when I set num_workers=0 or spawn start method, I still get CPU tensors based on what you have illustrated.

~~If I use Dataloader, I need to transform them to cuda manually. If I want to load data directly to gpu, I need to code the pipeline in dali by myself.~~ ~~I might be not familiar with new pytorch features. Is there any update about that? It’ll be a great news to know it’s supported.~~ ~~I see that the imagenet example still need to use cuda() to transform data to gpu, which is originally loaded from DataLoader.~~

~~https://github.com/pytorch/examples/blob/master/imagenet/main.py#L283-L284~~

Update: Ah, it’s always doable if torch.utils.data.Dataset return a CUDA tensor, which could be implemented by setting transforms in torchvision.datasets.XXX. Sorry for the misleading information above, and it’s certainly not recommended. I only considered some direct-to-gpu strategy like dali in previous statement.

The purpose of this issue should be the functionality about ToTensor(), whether it always transform data to cpu, or transform according to the default tensor type. My opinion is to force cpu as a quick fix for consistency, and discuss the feasibility about the latter method later.