question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Create PyTorch Dataset(s)

See original GitHub issue

Here I will track progress of creating PyTorch dataset. The .bc files I have so far come from the main PyTorch as well as its third-party projects. So actually we can create multiple datasets out of it.

You can look at the .bc files here [UPDATE: this directory has more files] The .bc files are found in:

  • PyTorch’s main files: caffe2
  • third party:
    • gloo: third_party/gloo/
    • onnx: third_party/onnx/
    • tensorpipe: third_party/tensorpipe/tensorpipe
    • oneDNN and mkl-dnn: third_party/ideep/mkl-dnn/third_party/oneDNN/ and third_party/ideep/mkl-dnn/src/
      • this is a bit confusing as mkl-dnn has been renamed to oneDNN, and PyTorch has included a third-party named ideep that includes directories for both. So we may use only oneDNN.
    • kineto: third_party/kineto/
    • protobuf: third_party/protobuf/
    • benchmark: third_party/benchmark
    • fbgemm: third_party/fbgemm/
    • NNPACK: ./confu-deps/NNPACK/
    • XNNPACK: confu-deps/XNNPACK/
    • QNNPACK: confu-deps/pytorch_qnnpack/
    • cpuinfo: confu-deps/cpuinfo/
    • sleef: ./sleef/
    • you can find a lot of other third party .bc files in a similar path to the one above

The build I made wasn’t complete (my Linux machine hanged because it is so old) so I can give it one more try to create more .bc files.

  • Upload .bc files to Google Drive [Updated link]
  • Document steps to build .bc files
  • Run random_walk.py on those .bc files
  • Create script to evaluate them for others to use
  • Add as an official dataset(s)
  • Update documentation with the dataset(s)

Cc @ChrisCummins

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mostafaelhoushicommented, May 14, 2022

The previous build was not complete, so I re-ran it and created another Google Drive that has around ~1000 more bitcode files: https://drive.google.com/drive/folders/1z6DueocWlfxG2ckftGrcgLoyZLOELkuQ?usp=sharing

0reactions
mostafaelhoushicommented, May 18, 2022

I have just updated the description with an extensive list of third-party libraries that were included in the generated bitcodes. So we can create a separate dataset for each benchmark. Also, hopeully some of those benchmarks can run on both Mac and Linux and won’t get the error we have observed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Writing Custom Datasets, DataLoaders and Transforms
PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable. In this tutorial, we will see...
Read more >
How to use Datasets and DataLoader in PyTorch for custom ...
How to use Datasets and DataLoader in PyTorch for custom text data. In this tutorial you will learn how to make a custom...
Read more >
Creating a custom Dataset and Dataloader in Pytorch - Medium
A dataloader in simple terms is a function that iterates through all our available data and returns it in the form of batches....
Read more >
04. PyTorch Custom Datasets
A custom dataset is a collection of data relating to a specific problem you're working on. In essence, a custom dataset can be ......
Read more >
A detailed example of data loaders with PyTorch
A detailed example of how to generate your data in parallel with PyTorch ... Large datasets are increasingly becoming part of our lives,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found