question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implementation of VarNaming task from ICLR '18

See original GitHub issue

Hi,

My name is Alex Haigh, and I’m a master’s student at Stanford. For a project, I’m working to reproduce (and hopefully extend) some of your results on VarNaming from your '18 ICLR paper. My understanding of your model for that task is that you:

  1. Replace each instance of your target variable with a <SLOT> token
  2. Represent every other variable as a concatenation of the average of the (learnable) embedding for each subtoken in its name and the type of the variable (as described on the top of p. 5) here
  3. Run message passing with a GGNN for 8 timesteps, using the program graph
  4. Average the final representation of every <SLOT> token
  5. Use this as input to a GRU decoder that outputs the variable name as a sequence of subtokens.

I found the dataset here, and it looks like it’s in the format digestible by utils/tensorise.py. Similarly, the model you use for VarNaming seems to be the Graph2SeqModel.

So, is this all you need to do to reproduce the results?

  • run utils/tensorise.py --model graph2seq on the dataset published in ICLR '18
  • train a graph2seq model on the dataset using utils/train.py PATH_TO_TENSORISED_GRAPHS --model graph2seq

Just wanted to make sure I’m looking in the right place, and would also appreciate any other tips you have. Also, what modifications did you make to the model based on Cvitovic et al? And is there a way you can compare results with/without those modifications?

Thanks!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:21 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
haighalcommented, Jun 4, 2019

Hey Marc,

Just wanted to say thanks again for your very helpful comments here (that was indeed the issue I was having at test time). We got an implementation working on Python ASTs using only the Syntax edges and got results pretty similar to what you reported in your ablation from the ICLR '18 paper!

Accuracy@1: 34.5671%
Accuracy@3: 42.4864%
Accuracy@5: 45.6902%

The code lives at https://github.com/haighal/graph-based-code-modelling/ and has a thorough summary of both (1) our modifications to your code base and (2) the steps to generate our results. Let me know if you have any questions - my email is haighal [at] cs.stanford.edu

Cheers, Alex

1reaction
mmjbcommented, May 15, 2019

Getting the NextToken edges right is indeed a bit harder to do with the Python parser (in Roslyn, we can (a) rely on the fact that the source code visitor visits things in the right order, see https://github.com/microsoft/graph-based-code-modelling/blob/a6a984b7d0b5965ff93477e502a8395dd036adf0/DataExtraction/SourceGraphExtractionUtils/Utils/SourceGraph.cs#L273)

However, Patrick, whom Miltos and I worked with on a recent ICLR paper, also released Python2Graph code, so you might want to look into simply reusing that: https://github.com/CoderPat/structured-neural-summarization/blob/master/parsers/sourcecode/barone/ast_graph_generator.py

Re type embeddings: Setting the size to 0 should just work, but it may require making a few things more robust in the tensorisation pipeline (i.e., access to the types would need to become optional). However, this data is just passed through, so the changes required should be fairly minor.

Marc

Read more comments on GitHub >

github_iconTop Results From Across the Web

Learning to Represent Programs with Graphs
We evaluate our method on two tasks: VarNaming, in which a network attempts to predict the name of a variable given its usage,...
Read more >
learning to represent programs with graphs
(ii) We present deep learning models for solving the VARNAMING and VARMISUSE tasks by modeling the code's graph structure and learning program ...
Read more >
Paper review: “Learning to Represent Programs with Graphs”
Tasks. For each of 2 tasks used in this paper a slightly different “program graph” and GG-NN model architecture was proposed. VarNaming.
Read more >
ICLR 2023 Reviewer Guide
This guide is intended to help you understand the ICLR 2023 decision process and your role within it. It contains: An outline of...
Read more >
Miltiadis Allamanis
Learning structures between groups of variables from data with missing values is an important task in the real world, yet difficult to solve....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found