Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slice can be simplified does not produce the same code

See original GitHub issue

Currently the IDE suggest with Span<T> not to use slice but instead use the Range indexer (IDE0057: Slice can be simplified to this). So the original code:

new Vector<T>(B.Slice(Idx))

is then translated to

new Vector<T>(B[Idx..])

the resulting generated code is different (which is OK)

    public void WithSlice() {
        ReadOnlySpan<float> A = new float[100];
        
        for (int i=0; i<A.Length; i+= Vector<float>.Count)
            new Vector<float>(A.Slice(i));   
    }
    
       public void WithRange() {
        ReadOnlySpan<float> A = new float[100];
        
        for (int i=0; i<A.Length; i+= Vector<float>.Count)
            new Vector<float>(A[i..]);
    }

and the result taken from sharplab:

    public void WithSlice()
    {
        ReadOnlySpan<float> readOnlySpan = new float[100];
        for (int i = 0; i < readOnlySpan.Length; i += Vector<float>.Count)
        {
            new Vector<float>(readOnlySpan.Slice(i));
        }
    }

    public void WithRange()
    {
        ReadOnlySpan<float> readOnlySpan = new float[100];
        for (int i = 0; i < readOnlySpan.Length; i += Vector<float>.Count)
        {
            ReadOnlySpan<float> readOnlySpan2 = readOnlySpan;
            int length = readOnlySpan2.Length;
            int num = i;
            int length2 = length - num;
            new Vector<float>(readOnlySpan2.Slice(num, length2));
        }
    }

but the JIT Asm seems to be different. I didn’t make any performance comparisons, but for my environment it is crucial that the suggested rewrite does not alter the performance of the code.

Issue Analytics

State:
Created 3 years ago
Reactions:4
Comments:9 (6 by maintainers)

Top GitHub Comments

3reactions

stephentoubcommented, Sep 11, 2020

@svick The C# compiler and the types it’s generating use of are resulting in significantly more complex code for the JIT to then need to unwind. Such issues are covered by existing issues like https://github.com/dotnet/runtime/issues/11848 and https://github.com/dotnet/runtime/issues/11870. If this issue is about the JIT, it should just be closed.

That said, the C# compiler is doing a disservice here by using Slice(start, count) rather than just Slice(start). The latter has less code in it, less checks to be performed, and less code required to invoke it. I don’t know that the JIT would ever be able to make them identical; in theory it could, in practice that’s a whole lot of analysis. This was all raised when the indexing feature was introduced, and it was decided by the language team that such level of optimization wasn’t important.

If the analyzer from dotnet/roslyn is encouraging this transformation and folks expect the resulting code to be identical, then the dotnet/roslyn compiler should be updated to abide. There’s no guarantee the JIT will produce the same asm, otherwise.

0reactions

msedicommented, Sep 14, 2020

I just wanted to point another benchmark on another system. I just wanted to point that there was a long way from coming from the “old” without-Span<T> world, where we did everything either in regular C#, going to unsafe code with pointer and then using Span<T> and finally using Vector<T> - Honestly thanks for the improvement in speed . The last method was a suggestion from @benaadams (thanks for that). One interesting finding is though that the AddVectorizedSpanWithRange is definitely slower on this example computer than on the first.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1082 (1909/November2018Update/19H2)
Intel Xeon CPU E5-2670 0 2.60GHz, 2 CPU, 32 logical and 16 physical cores
.NET Core SDK=5.0.100-preview.6.20318.15
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.30506, CoreFX 5.0.20.30506), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.30506, CoreFX 5.0.20.30506), X64 RyuJIT


|                     Method |     Mean |     Error |    StdDev |
|--------------------------- |---------:|----------:|----------:|
| AddVectorizedSpanWithSlice | 2.953 ms | 0.0143 ms | 0.0111 ms |
| AddVectorizedSpanWithRange | 4.143 ms | 0.0163 ms | 0.0128 ms |
|         AddVectorizedArray | 4.043 ms | 0.0166 ms | 0.0139 ms |
|            AddArrayClassic | 8.396 ms | 0.0050 ms | 0.0045 ms |
|             AddArrayUnsafe | 4.180 ms | 0.0036 ms | 0.0030 ms |
|   AddVectorizedArrayUnsafe | 2.512 ms | 0.0149 ms | 0.0132 ms |

Just for reference, here’s is the code:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

using System;
using System.Numerics;
using System.Runtime.CompilerServices;

namespace ConsoleApp8
{
    class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run<VectorizedBenchmark>();
        }
    }

    public class VectorizedBenchmark
    {
        int N = 2048 * 2048;
        float[] DataA;
        float[] DataB;
        float[] Result;

        [GlobalSetup]
        public void GlobalSetup()
        {
            DataA = new float[N];
            DataB = new float[N];
            Result = new float[N];
            for (int i = 0; i < N; i++)
            {
                DataA[i] = i;
                DataB[i] = i;
            }
        }

        [Benchmark]
        public float[] AddVectorizedSpanWithSlice()
        {
            int StepSize = Vector<float>.Count;
            int L = N - (N % StepSize);
            int i = 0;

            ReadOnlySpan<float> A = DataA;
            ReadOnlySpan<float> B = DataA;
            Span<float> C = Result;

            for (; i < L; i += StepSize)
            {
                (new Vector<float>(A.Slice(i)) + new Vector<float>(B.Slice(i))).CopyTo(C.Slice(i));
            }

            for (; i < N; i++)
            {
                Result[i] = DataA[i] + DataB[i];
            }

            return Result;
        }

        [Benchmark]
        public float[] AddVectorizedSpanWithRange()
        {
            int StepSize = Vector<float>.Count;
            int L = N - (N % StepSize);
            int i = 0;

            ReadOnlySpan<float> A = DataA;
            ReadOnlySpan<float> B = DataA;
            Span<float> C = Result;

            for (; i < L; i += StepSize)
            {
                (new Vector<float>(A[i..]) + new Vector<float>(B[i..])).CopyTo(C[i..]);
            }

            for (; i < N; i++)
            {
                Result[i] = DataA[i] + DataB[i];
            }

            return Result;
        }

        [Benchmark]
        public float[] AddVectorizedArray()
        {
            int StepSize = Vector<float>.Count;
            int L = N - (N % StepSize);
            int i = 0;

            for (; i < L; i += StepSize)
            {
                (new Vector<float>(DataA, i) + new Vector<float>(DataB, i)).CopyTo(Result, i);
            }

            for (; i < N; i++)
            {
                Result[i] = DataA[i] + DataB[i];
            }

            return Result;
        }

        [Benchmark]
        public float[] AddArrayClassic()
        {
            for (int i = 0; i < N; i++)
            {
                Result[i] = DataA[i] + DataB[i];
            }

            return Result;
        }

        [Benchmark]
        public float[] AddArrayUnsafe()
        {
            unsafe
            {
                fixed (float* dataA = DataA)
                fixed (float* dataB = DataB)
                fixed (float* result = Result)
                {
                    float* dataAptr = dataA, dataBptr = dataB, resultptr = result;
                    for (int i = 0; i < N; i++, dataAptr++, dataBptr++, resultptr++)
                    {
                        *resultptr = *dataAptr + *dataBptr;
                    }
                }
            }

            return Result;
        }

        [Benchmark]
        public float[] AddVectorizedArrayUnsafe()
        {
            int StepSize = Vector<float>.Count;
            int L = N - (N % StepSize);
            int i = 0;

            ref float A = ref DataA[0];
            ref float B = ref DataB[0];
            ref float C = ref Result[0];

            for (; i < L; i += StepSize)
            {
                var vectorA = Unsafe.ReadUnaligned<Vector<float>>(ref Unsafe.As<float, byte>(ref Unsafe.Add(ref A, i)));
                var vectorB = Unsafe.ReadUnaligned<Vector<float>>(ref Unsafe.As<float, byte>(ref Unsafe.Add(ref A, i)));
                var vectorC = vectorA + vectorB;

                Unsafe.WriteUnaligned<Vector<float>>(ref Unsafe.As<float, byte>(ref Unsafe.Add(ref C, i)), vectorC);
            }

            for (; i < N; i++)
            {
                Result[i] = DataA[i] + DataB[i];
            }
            return Result;
        }
    }
}