question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unit testing c# code that is reliant on DataFrame and DataFrameReader

See original GitHub issue

I am using Micorsoft.Spark.Sql in my C# spark project and I am trying to develop unit tests for my classes that rely on manipulating DataFrames.

I am running into issues trying to mock or fake out either SparkSession or DataFrames since they are sealed classes. Ideally i would like to create a SparkSession object that doesn’t rely on being connected to an external source to avoid test instability. I have haven’t found any resources that would indicate how to accomplish this so far. I have looked at the test classes in this repo which accomplish this but all the objects that these tests mock are not available to consumers of the public nuget package.

I would like to eventually be able to write code in my tests like following code snipt from this post

var data = new List<GenericRow>();
data.Add(new GenericRow(new object[] { "Alice", new Date(2020, 1, 1) }));
data.Add(new GenericRow(new object[] { "Bob", new Date(2020, 1, 2) }));

var schema = new StructType(new List<StructField>()
{
    new StructField("name", new StringType()),
    new StructField("date", new DateType())
});

DataFrame df = spark.CreateDataFrame(data, schema);

I would appreciate some help either creating a SparkSession that doesn’t rely on an external connection or how to mock the SparkSession/DataFrame object. I want to create unit test that don’t rely on external connections if possible.

Does any one know if this is possible or have done it before?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
suhstevecommented, Aug 20, 2020

@ccard I don’t know where you’re running into issues. But please see this example project and let me know if this is what you are looking for. I’m able to create a mock of IJvmBridge and also access SparkSession constructor.

Program.cs:

using Microsoft.Spark.Interop;
using Microsoft.Spark.Interop.Ipc;
using Microsoft.Spark.Sql;
using Moq;

namespace example
{
    class Program
    {
        static void Main(string[] args)
        {
            var mockJvm = new Mock<IJvmBridge>();
            mockJvm
                .Setup(m => m.CallStaticJavaMethod(
                    It.IsAny<string>(),
                    It.IsAny<string>(),
                    It.IsAny<object>()))
                .Returns(
                    new JvmObjectReference("result", mockJvm.Object));

            var mockJvmBridgeFactory = new Mock<IJvmBridgeFactory>();
            mockJvmBridgeFactory
                .Setup(m => m.Create(It.IsAny<int>()))
                .Returns(mockJvm.Object);

            SparkEnvironment.JvmBridgeFactory = mockJvmBridgeFactory.Object;

            // SparkSession is accessible.
            SparkSession s = new SparkSession(null);
        }
    }
}

example.csproj:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp3.1</TargetFramework>
  </PropertyGroup>

  <PropertyGroup>
    <InternalsAssemblyNames>Microsoft.Spark</InternalsAssemblyNames>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.Spark" Version="0.12.1" />
    <PackageReference Include="Moq" Version="4.14.5" />
    <PackageReference Include="IgnoresAccessChecksToGenerator" Version="0.4.0" PrivateAssets="All" />
  </ItemGroup>

</Project>
2reactions
ccardcommented, Aug 21, 2020

@suhsteve your suggestions and help where fantastic. I now have it working with the below code snipit for version Microsoft.Spark(0.9.0.0). Thank you for your help i really appreciate it.

var mockJvm = new Mock<IJvmBridge>();
            mockJvm
                .Setup(m => m.CallStaticJavaMethod(
                    It.IsAny<string>(),
                    It.IsAny<string>(),
                    It.IsAny<object>()))
                .Returns(
                    new JvmObjectReference("result", mockJvm.Object));
            mockJvm
                .Setup(m => m.CallStaticJavaMethod(
                    It.IsAny<string>(),
                    It.IsAny<string>(),
                    It.IsAny<object>(),
                    It.IsAny<object>()))
                .Returns(
                    new JvmObjectReference("result", mockJvm.Object));
            mockJvm
                .Setup(m => m.CallStaticJavaMethod(
                    It.IsAny<string>(),
                    It.IsAny<string>(),
                    It.IsAny<object[]>()))
                .Returns(
                    new JvmObjectReference("result", mockJvm.Object));

            mockJvm
                .Setup(m => m.CallNonStaticJavaMethod(
                    It.IsAny<JvmObjectReference>(),
                    It.IsAny<string>(),
                    It.IsAny<object>()))
                .Returns(
                    new JvmObjectReference("result", mockJvm.Object));
            mockJvm
                .Setup(m => m.CallNonStaticJavaMethod(
                    It.IsAny<JvmObjectReference>(),
                    It.IsAny<string>(),
                    It.IsAny<object>(),
                    It.IsAny<object>()))
                .Returns(
                    new JvmObjectReference("result", mockJvm.Object));
            mockJvm
                .Setup(m => m.CallNonStaticJavaMethod(
                    It.IsAny<JvmObjectReference>(),
                    It.IsAny<string>(),
                    It.IsAny<object[]>()))
                .Returns(
                    new JvmObjectReference("result", mockJvm.Object));

            SparkEnvironment.JvmBridge = mockJvm.Object;

            // SparkSession is accessible.
            SparkSession s = SparkSession.Active();

            var data = new List<GenericRow>();
            data.Add(new GenericRow(new object[] { "Alice", new Date(2020, 1, 1) }));
            data.Add(new GenericRow(new object[] { "Bob", new Date(2020, 1, 2) }));

            var schema = new StructType(new List<StructField>()
                {
                    new StructField("name", new StringType()),
                    new StructField("date", new DateType()),
                });

            DataFrame df = s.CreateDataFrame(data, schema);
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to mock Spark DataFrameReader using scala?
I want to unit test code that read DataFrame from RDBMS using sparkSession.read.jdbc(...) . But I did't find a way how to mock...
Read more >
scala - Unit testing Spark transformation on DataFrame
The test class generates a DataFrame from static data and passes it to a transformation, then makes assertion on the passing static data ......
Read more >
Migration Guide: SQL, Datasets and DataFrame
You need their specific clauses to specify them, for example, CREATE DATABASE test COMMENT 'any comment' LOCATION 'some path' . You can set...
Read more >
Unit Testing and Coding: Why Testable Code Matters
We will see that writing unit tests and generating testable code is not just about making testing less troublesome, but about making the...
Read more >
PySpark — Writing Unit Tests for Spark SQL Transformations
An introduction to writing unit tests for Spark SQL using Python unittest library, covering creating test data of types including Map, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found