Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Create PyTorch Dataset(s)

See original GitHub issue

Here I will track progress of creating PyTorch dataset. The .bc files I have so far come from the main PyTorch as well as its third-party projects. So actually we can create multiple datasets out of it.

You can look at the .bc files here [UPDATE: this directory has more files] The .bc files are found in:

PyTorch’s main files: caffe2
third party:
- gloo: third_party/gloo/
- onnx: third_party/onnx/
- tensorpipe: third_party/tensorpipe/tensorpipe
- oneDNN and mkl-dnn: third_party/ideep/mkl-dnn/third_party/oneDNN/ and third_party/ideep/mkl-dnn/src/
  - this is a bit confusing as mkl-dnn has been renamed to oneDNN, and PyTorch has included a third-party named ideep that includes directories for both. So we may use only oneDNN.
- kineto: third_party/kineto/
- protobuf: third_party/protobuf/
- benchmark: third_party/benchmark
- fbgemm: third_party/fbgemm/
- NNPACK: ./confu-deps/NNPACK/
- XNNPACK: confu-deps/XNNPACK/
- QNNPACK: confu-deps/pytorch_qnnpack/
- cpuinfo: confu-deps/cpuinfo/
- sleef: ./sleef/
- you can find a lot of other third party .bc files in a similar path to the one above

The build I made wasn’t complete (my Linux machine hanged because it is so old) so I can give it one more try to create more .bc files.

Upload .bc files to Google Drive [Updated link]
Document steps to build .bc files
Run random_walk.py on those .bc files
Create script to evaluate them for others to use
Add as an official dataset(s)
Update documentation with the dataset(s)

Cc @ChrisCummins

Issue Analytics

State:
Created a year ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

mostafaelhoushicommented, May 14, 2022

The previous build was not complete, so I re-ran it and created another Google Drive that has around ~1000 more bitcode files: https://drive.google.com/drive/folders/1z6DueocWlfxG2ckftGrcgLoyZLOELkuQ?usp=sharing

0reactions

mostafaelhoushicommented, May 18, 2022

I have just updated the description with an extensive list of third-party libraries that were included in the generated bitcodes. So we can create a separate dataset for each benchmark. Also, hopeully some of those benchmarks can run on both Mac and Linux and won’t get the error we have observed.