question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Striding loop performance

See original GitHub issue

System.numerics.vectors exposes a SIMD enhanced Vector classes. Using VS2015 Update 1, latest versions of .NET framework and F# and System.numerics.vectors the performance of System.Numerics is worse than not using it at all, for instance:

     let sumVectorLoop =
            let mutable total = Vector<int>.Zero
            for i in 0  .. COUNT/8-1 do
                total <- total + vecArray.[i]
            total

Is slower than the same operation on an array of integers:

     let sumsLoop =
            let mutable total = 0;
            for i in 0 .. COUNT - 1 do
                total <- total + numsArray.[i]
            total

I have confirmed that Vector.isHardwareAccelerated reports as true. I have confirmed that equivalent code in C# runs ~2x faster for the Vector approach. Interestingly, using Array.reduce on the vector array is faster than the imperative loop, which is the opposite of working with an array of ints, suggesting something may be amiss:

let sumVectorReduce =
        Array.reduce (fun a e -> a + e)  vecArray

Issue Analytics

  • State:open
  • Created 8 years ago
  • Reactions:1
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
dsymecommented, Jul 7, 2016

I started to take a look at this, and it’s not easy.

One problem is that the F# “FastIntegerLoop” TAST construct can’t represent striding loops. It could be extended, but this has to be done with care since the construct can (and does) occur in optimization information and the representations of inlined functions. Ideally care should be taken that DLLs that generate this new construct be consumable by down-level F# compilers, but that’s hard to arrange.

Another problem is that “F#-style loops” for x in n .. step .. m are currently generated using an “bne” branch-not-equals instruction at the end condition. This is done because m might be MaxInt. But this won’t work for striding loops - a less-than operation is needed. But a less-than operation doesn’t work when m is above MaxInt - step since a wrap-around occurs.

Perhaps we could just sacrifice semantics for striding loops near the maxint condition - though whatever we do parity with C# is really needed. Perhaps I need to look more closely at C# code generation for these cases

1reaction
dsymecommented, Mar 23, 2020

@dsyme given the renewed focus on slicing (and its syntax) for .NET 5, what do you think of revisiting this? How common do you feel this scenario is for numeric programming in general?

Yes, we should fix this, definitely.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA Pro Tip: Write Flexible Kernels with Grid-Stride Loops
By using a loop with stride equal to the grid size, we ensure that all addressing within warps is unit-stride, so we get...
Read more >
Is it preferable to loop over multiple iterations with the ...
The answer is always don't optimize prematurely. Go for the easier implementation (one thread per iteration), and evaluate if that kernel's ...
Read more >
CUDA grid stride loop for nested for loop
The reason the grid-stride loop on the outer for-loop makes sense is because the work done on the outer for-loop iterations is independent....
Read more >
I can't tell you how many large-power-of-two stride loops ...
Strided access has always been slow, on CPUs, on GPUs, on everything, for the last 30 years. Its a known issue. "Fixing" strided...
Read more >
Optimizing Loop Stride - Michael Brundage
In this article I describe an optimization technique I've used to squeeze an extra 10% or so out of C/C++ code. A common...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found