Training the model for varmisuse task
See original GitHub issueHey! I tried to run training of the varmisuse model in order to explore how it works on data from unseen projects. I have a few questions regarding it:
- Seems like the dataset format has changed compared to the published version of data. I’ve found the following issue in another repository. Unfortunately, I had already reorganized data before finding the issue: converted json files into jsonlines and changed structure from
project/{train|test|valid}/files
to{train|test|valid}/files
. It would be nice to either duplicate the reorganizing script to this repo, or add a link to the issue in README. - After reorganizing the data, I tried to run training with default settings (minibatch size = 300) on an instance with 94 GB RAM and 48 CPUs. The instance doesn’t have GPU because I wanted to measure the memory usage so that I can allocate a proper GPU instance afterward. Unfortunately, training fails with OOM error, because it quickly utilizes 94 GB and asks for more. Moreover, I’ve tried to create a smaller version of the dataset by picking only 1 project from train/validation/test, and it didn’t really help: with a minibatch size of 100 and a single project in train part I still got OOM. Is it expected behavior?
- Which instance do you recommend for training the model? In particular, how much RAM do I need and how long does the training take on, let’s say, V100?
- Do you have a pre-trained model that you can share? Maybe I can avoid the training at all and just run the already trained model on different data.
Thanks a lot in advance and thanks for great projects and papers!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Learning to Represent Programs with Graphs
We evaluate our method on two tasks: VarNaming, in which a network attempts to predict the name of a variable given its usage,...
Read more >arXiv:1711.00740v3 [cs.LG] 4 May 2018
(ii) We present deep learning models for solving the VARNAMING and VARMISUSE tasks by modeling the code's graph structure and learning program ...
Read more >Setting a Benchmark for Representation Learning of Source ...
on the representation of source code produced from training a Deep Learning model. Identifying the tasks that each of these papers were trying...
Read more >IN4334 - Machine Learning for Software Engineering
Project 5: VarMisuse in a different programming language (Maurício). ... In this task, you will train ML models to recommend (or maybe even ......
Read more >Learning to Represent Programs with Graphs - OpenReview
Abstract: Learning tasks on source code (i.e., formal languages) have been ... our models learn to infer meaningful names and to solve the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@mallamanis thanks a lot for the lightning-fast reply!
The convert script is very similar to my one. I will try to use subtoken model and report the results.
I just run this in the CPU and I can replicate this issue… I assume that the problem is that pyTorch fuses some operations in the GPU but not in the CPU for the character CNN. If you change the model to use subtokens (change
"char"
to"subtoken"
here ), then the problem goes away.The performance of subtoken/char models is fairly similar, so this might be good enough for now. I’ll try to investigate why the charCNN has such a terrible performance on CPU, hopefully next week…