Create PyTorch Dataset(s)
See original GitHub issueHere I will track progress of creating PyTorch dataset. The .bc files I have so far come from the main PyTorch as well as its third-party projects. So actually we can create multiple datasets out of it.
You can look at the .bc files here [UPDATE: this directory has more files] The .bc files are found in:
- PyTorch’s main files:
caffe2
- third party:
- gloo:
third_party/gloo/
- onnx:
third_party/onnx/
- tensorpipe:
third_party/tensorpipe/tensorpipe
- oneDNN and mkl-dnn:
third_party/ideep/mkl-dnn/third_party/oneDNN/
andthird_party/ideep/mkl-dnn/src/
- this is a bit confusing as mkl-dnn has been renamed to oneDNN, and PyTorch has included a third-party named
ideep
that includes directories for both. So we may use only oneDNN.
- this is a bit confusing as mkl-dnn has been renamed to oneDNN, and PyTorch has included a third-party named
- kineto:
third_party/kineto/
- protobuf:
third_party/protobuf/
- benchmark:
third_party/benchmark
- fbgemm:
third_party/fbgemm/
- NNPACK:
./confu-deps/NNPACK/
- XNNPACK:
confu-deps/XNNPACK/
- QNNPACK:
confu-deps/pytorch_qnnpack/
- cpuinfo:
confu-deps/cpuinfo/
- sleef:
./sleef/
- you can find a lot of other third party .bc files in a similar path to the one above
- gloo:
The build I made wasn’t complete (my Linux machine hanged because it is so old) so I can give it one more try to create more .bc files.
- Upload .bc files to Google Drive [Updated link]
- Document steps to build .bc files
- Run random_walk.py on those .bc files
- Create script to evaluate them for others to use
- Add as an official dataset(s)
- Update documentation with the dataset(s)
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Writing Custom Datasets, DataLoaders and Transforms
PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable. In this tutorial, we will see...
Read more >How to use Datasets and DataLoader in PyTorch for custom ...
How to use Datasets and DataLoader in PyTorch for custom text data. In this tutorial you will learn how to make a custom...
Read more >Creating a custom Dataset and Dataloader in Pytorch - Medium
A dataloader in simple terms is a function that iterates through all our available data and returns it in the form of batches....
Read more >04. PyTorch Custom Datasets
A custom dataset is a collection of data relating to a specific problem you're working on. In essence, a custom dataset can be ......
Read more >A detailed example of data loaders with PyTorch
A detailed example of how to generate your data in parallel with PyTorch ... Large datasets are increasingly becoming part of our lives,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The previous build was not complete, so I re-ran it and created another Google Drive that has around ~1000 more bitcode files: https://drive.google.com/drive/folders/1z6DueocWlfxG2ckftGrcgLoyZLOELkuQ?usp=sharing
I have just updated the description with an extensive list of third-party libraries that were included in the generated bitcodes. So we can create a separate dataset for each benchmark. Also, hopeully some of those benchmarks can run on both Mac and Linux and won’t get the error we have observed.