question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Stream is closed" error, but program finishes executing correctly

See original GitHub issue

I’m running a .NET Spark app for batch processing with a selection of GitHub projects data. My program runs as expected up through making a Spark Sql call:

// ...Code creating spark session, reading into data frame, doing some sorting...
// Then move on to creating UDF, SQL call:
spark.Udf().Register<string, bool>("MyUDF", (date) => DateTest(date));
cleanedProjects.CreateOrReplaceTempView("dateView");
DataFrame dateDf = spark.Sql("SELECT *, MyUDF(dateView.updated_at) AS datebefore FROM dateView");

After the Sql query, I perform a Filter() and Drop() for some final processing:

DataFrame filteredDates = dateDf.Filter(dateDf["datebefore"] == true);
filteredDates.Show();
filteredDates.Drop("datebefore").Show();

After I Show() each of these modified DataFrames, I get the error: ProcessStream() failed with exception: System.ArgumentException: The stream is closed. However, even with that error, both calls to Show() do execute successfully and see the correct output.

Why might I be receiving this error (twice)? Since I get the correct output, the error doesn’t seem to be affecting anything?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
luciantimarcommented, Nov 13, 2020

Hi,

I got a similar issue with Spark .Net 1.0 but the issue is only when I am using Udf functions. The Show method shows the correct results.

Is it something that I should worry about? or is just some cleaning exceptions because the log is showing that the “RunSimpleWorker() finished successfully” and also “DotnetRunner: .NET application exited successfully”.

Java version 8 Spark 2.4.1 or Spark 3.0 .Net Core 3.1 Microsoft.Spark.Worker-1.0.0

The udf functions are


            Func<Column, Column> toTimeSpan = Functions.Udf<string, double>(str =>
            {
                TimeSpan ts;
                if (TimeSpan.TryParse(str, out ts))
                {
                    return ts.TotalSeconds;
                }
                return 0;
            });

            Func<Column, Column, Column> toDuration = Functions.Udf<string, string, double>((str1, str2) =>
            {
                TimeSpan ts1;
                TimeSpan ts2;
                if (TimeSpan.TryParse(str1, out ts1) && TimeSpan.TryParse(str2, out ts2))
                {
                    return ts2.TotalSeconds - ts1.TotalSeconds;
                }
                return 0;
            });

...

  DataFrame df1 = dataFrame
                .WithColumn("Start", toTimeSpan(dataFrame.Col("SessionStartTime")))
                .WithColumn("End", toTimeSpan(dataFrame.Col("SessionEndTime")))
                .WithColumn("Duration",toDuration(Functions.Col("SessionStartTime"), Functions.Col("SessionEndTime")))

|1010.0|60.0|     4|         98|       28|        00:07:00|      00:15:00|-111|         3|420.0| 900.0|   480.0|5000|5000|
+------+----+------+-----------+---------+----------------+--------------+----+----------+-----+------+--------+----+----+
only showing the top 20 rows

[2020-11-13T09:03:06.7403736Z] [N-20N3PF1C0NK5] [Error] [TaskRunner] [0] ProcessStream() failed with exception: System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine..
 ---> System.Net.Sockets.SocketException (10053): An established connection was aborted by the software in your host machine.
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.IO.BufferedStream.Write(Byte[] array, Int32 offset, Int32 count)
   at Microsoft.Spark.Interop.Ipc.SerDe.Write(Stream s, Byte[] value, Int32 count) in /_/src/csharp/Microsoft.Spark/Interop/Ipc/SerDe.cs:line 283
   at Microsoft.Spark.Worker.Command.PicklingSqlCommandExecutor.WriteOutput(Stream stream, IEnumerable`1 rows, Int32 sizeHint) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Command\SqlCommandExecutor.cs:line 191
   at Microsoft.Spark.Worker.Command.PicklingSqlCommandExecutor.ExecuteCore(Stream inputStream, Stream outputStream, SqlCommand[] commands) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Command\SqlCommandExecutor.cs:line 158
   at Microsoft.Spark.Worker.Command.SqlCommandExecutor.Execute(Version version, Stream inputStream, Stream outputStream, PythonEvalType evalType, SqlCommand[] commands) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Command\SqlCommandExecutor.cs:line 76
   at Microsoft.Spark.Worker.Command.CommandExecutor.Execute(Stream inputStream, Stream outputStream, Int32 splitIndex, CommandPayload commandPayload) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Command\CommandExecutor.cs:line 65
   at Microsoft.Spark.Worker.TaskRunner.ProcessStream(Stream inputStream, Stream outputStream, Version version, Boolean& readComplete) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\TaskRunner.cs:line 154
[2020-11-13T09:03:06.7410655Z] [N-20N3PF1C0NK5] [Error] [TaskRunner] [0] Exitin20/11/13 11:03:06 INFO SparkUI: Stopped Spark web UI at http://N-20N3PF1C0NK5.nsn-intra.net:4040
g with exception: System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine..
 ---> System.Net.Sockets.SocketException (10053): An established connection was aborted by the software in your host machine.
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.IO.BufferedStream.Write(Byte[] array, Int32 offset, Int32 count)
   at Microsoft.Spark.Interop.Ipc.SerDe.Write(Stream s, Byte[] value, Int32 count) in /_/src/csharp/Microsoft.Spark/Interop/Ipc/SerDe.cs:line 283
   at Microsoft.Spark.Worker.Command.PicklingSqlCommandExecutor.WriteOutput(Stream stream, IEnumerable`1 rows, Int32 sizeHint) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Command\SqlCommandExecutor.cs:line 191
   at Microsoft.Spark.Worker.Command.PicklingSqlCommandExecutor.ExecuteCore(Stream inputStream, Stream outputStream, SqlCommand[] commands) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Command\SqlCommandExecutor.cs:line 158
   at Microsoft.Spark.Worker.Command.SqlCommandExecutor.Execute(Version version, Stream inputStream, Stream outputStream, PythonEvalType evalType, SqlCommand[] commands) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Command\SqlCommandExecutor.cs:line 76
   at Microsoft.Spark.Worker.Command.CommandExecutor.Execute(Stream inputStream, Stream outputStream, Int32 splitIndex, CommandPayload commandPayload) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Command\CommandExecutor.cs:line 65
   at Microsoft.Spark.Worker.TaskRunner.ProcessStream(Stream inputStream, Stream outputStream, Version version, Boolean& readComplete) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\TaskRunner.cs:line 154
   at Microsoft.Spark.Worker.TaskRunner.Run() in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\TaskRunner.cs:line 66
[2020-11-13T09:03:06.7448127Z] [N-20N3PF1C0NK5] [Warn] [TaskRunner] [0] Exception while closing socket: System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine..
 ---> System.Net.Sockets.SocketException (10053): An established connection was aborted by the software in your host machine.
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.IO.BufferedStream.Flush()
   at System.IO.BufferedStream.Dispose(Boolean disposing)
   at System.IO.Stream.Close()
   at System.IO.Stream.Dispose()
   at Microsoft.Spark.Network.DefaultSocketWrapper.Dispose() in /_/src/csharp/Microsoft.Spark/Network/DefaultSocketWrapper.cs:line 56
   at Microsoft.Spark.Worker.TaskRunner.Run() in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\TaskRunner.cs:line 115
[2020-11-13T09:03:06.7449393Z] [N-20N3PF1C0NK5] [Info] [TaskRunner] [0] Finished running 0 task(s).
[2020-11-13T09:03:06.7449660Z] [N-20N3PF1C0NK5] [Info] [SimpleWorker] RunSimpleWorker() finished successfully
0reactions
elvaliuliuliucommented, Oct 17, 2019

Not sure, but I don’t think show would cause this error. Could be Udf related issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

"Stream Closed" Error caused apparently by multiple submits
My java code works most of the time, but sometimes randomly I get the Stream Closed exception. I suspect this may be if...
Read more >
Processing Error
StreamClosedError: Stream is closed During handling of the above exception, another exception occurred: Traceback (most recent call last): File ...
Read more >
java.io.IOException: Stream is closed! Error in HDInsight ...
I have been in touch with the development team regarding this error I have been getting when running a command to write a...
Read more >
1014393 – Stream closed exception in resetStream on IBM ...
IOException: Stream closed` error when using IBM JDK 16, 17 or `xerces:xercesImpl:2.9.1-redhat-x` (provided by EAP 6) as a dependency in a resteasy 2.3.6....
Read more >
The try-with-resources Statement - Exceptions
A resource is an object that must be closed after the program is finished with it. The try -with-resources statement ensures that each...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found