question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What's the difference from DARTS?

See original GitHub issue

Thanks for sharing the code.

I have a question about the implementation difference from DARTS. The training code looks like very similar to DARTS(https://github.com/quark0/darts).

As you mentioned in the paper, “2. Instead of using the whole DAG, GDAS samples one sub-graph at one training iteration, accelerating the searching procedure. Besides, the sampling in GDAS is learnable and contributes to finding a better cell.”

But in the forward function of MixedOp, the output is just the weighted sum of all ops, same as DARTS.

def forward(self, x, weights): return sum(w * op(x) for w, op in zip(weights, self._ops))

So, can you point out the code that “samples one sub-graph at one training iteration”? Thanks.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
D-X-Ycommented, May 28, 2019

Sorry for the confusion. That file is DARTS instead of our algorithm, we did not release the searching codes of GDAS. The main difference between GDAS and DARTS is that we use Gumbel-softmax with an acceleration trick to allow only one candidate CNN is used during forwarding, while can still back-propagate to the architecture parameters.

0reactions
buttercuttercommented, Sep 10, 2021

in the forward procedure, we only need to calculate the function Farg max(hi,j ) . During the backward procedure, we only back-propagate the gradient generated at the arg max(h̃i,j ).

@D-X-Y

In the quoted text above inside gdas paper, I have few questions :

  1. I suppose argmax operation is not differentiable in pytorch ?
  2. If not back-propagate all the other gradients, then the computational graph will be broken or detached in some way ?
  3. Does this gumbel-softmax trick need to be applied for both training (for W training) and validation (for A training) datasets ?
Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Choose a Dart - 9 Things To Consider - DartHelp.com
The most common dart weights are between 16 to 26 grams; however, modern rules allow darts to weigh up to 50 grams. What...
Read more >
Differences in Darts (Soft & Steel)
With the exception of wooden darts, both soft and steel-tip darts are made of the same materials. The major differences that will be...
Read more >
What Is The Difference Between Soft Tip And Steel Tip Darts?
In a nutshell, steel tip darts are made using metal points, and are made to be used on bristle based dart boards, whereas...
Read more >
Steel or soft tip dart? | Shot Darts Discover blog - Shot Darts
Some darts are only sold as steel tip versions not soft tips, and vice versa. The other key difference is in weight. For...
Read more >
Do Expensive Darts Make a Difference? [Pros & Cons]
Expensive darts are made of better and higher quality materials like tungsten. The higher the percentage of tungsten makes the dart more expensive....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found