Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Striding loop performance

See original GitHub issue

System.numerics.vectors exposes a SIMD enhanced Vector classes. Using VS2015 Update 1, latest versions of .NET framework and F# and System.numerics.vectors the performance of System.Numerics is worse than not using it at all, for instance:

     let sumVectorLoop =
            let mutable total = Vector<int>.Zero
            for i in 0  .. COUNT/8-1 do
                total <- total + vecArray.[i]
            total

Is slower than the same operation on an array of integers:

     let sumsLoop =
            let mutable total = 0;
            for i in 0 .. COUNT - 1 do
                total <- total + numsArray.[i]
            total

I have confirmed that Vector.isHardwareAccelerated reports as true. I have confirmed that equivalent code in C# runs ~2x faster for the Vector approach. Interestingly, using Array.reduce on the vector array is faster than the imperative loop, which is the opposite of working with an array of ints, suggesting something may be amiss:

let sumVectorReduce =
        Array.reduce (fun a e -> a + e)  vecArray

Issue Analytics

State:
Created 8 years ago
Reactions:1
Comments:13 (12 by maintainers)

Top GitHub Comments

2reactions

dsymecommented, Jul 7, 2016

I started to take a look at this, and it’s not easy.

One problem is that the F# “FastIntegerLoop” TAST construct can’t represent striding loops. It could be extended, but this has to be done with care since the construct can (and does) occur in optimization information and the representations of inlined functions. Ideally care should be taken that DLLs that generate this new construct be consumable by down-level F# compilers, but that’s hard to arrange.

Another problem is that “F#-style loops” for x in n .. step .. m are currently generated using an “bne” branch-not-equals instruction at the end condition. This is done because m might be MaxInt. But this won’t work for striding loops - a less-than operation is needed. But a less-than operation doesn’t work when m is above MaxInt - step since a wrap-around occurs.

Perhaps we could just sacrifice semantics for striding loops near the maxint condition - though whatever we do parity with C# is really needed. Perhaps I need to look more closely at C# code generation for these cases

1reaction

dsymecommented, Mar 23, 2020

@dsyme given the renewed focus on slicing (and its syntax) for .NET 5, what do you think of revisiting this? How common do you feel this scenario is for numeric programming in general?

Yes, we should fix this, definitely.