Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pyspark - timestamp with microseconds, causes exception on .save()

See original GitHub issue

When column contains timestamp with non 0 microseconds the .save() fails with generic “com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.” exception.

Truncation of microsecond to 0 works around the problem. The output table created by the .save() has column of “datetime” data type. Hence i presume it might be related to handling of “rounding” microseconds to satisfy precision requirements of “datetime” datatype.

env:

sql spark connector version 1.1,
spark 2.4.5
databricks 6.4 runtime.

how to reproduce:

batchTimestamp = datetime.now()

#
# uncomment to truncate milliseconds, only truncation to 0 works
#batchTimestamp = batchTimestamp.replace(microsecond = 0)

print(batchTimestamp.isoformat(sep=' '))

df = spark \
  .createDataFrame([("a", 1), ("b", 2), ("c",  3)], ["Col1", "Col2"]) \
  .withColumn('ts', lit(batchTimestamp))

df.show()

df \
  .write \
  .format("com.microsoft.sqlserver.jdbc.spark") \
  .mode("overwrite") \
  .option("url", sql_url) \
  .option("dbtable", 'test_table') \
  .option("user", sql_username) \
  .option("password", sql_password) \
  .save()

Issue Analytics

State:
Created 3 years ago
Comments:15 (3 by maintainers)

Top GitHub Comments

3reactions

rajmera3commented, Mar 1, 2021

As @shivsood mentioned this issue occurs due to the mapping of timestamp to DateTime instead of DateTime2. As mentioned in #83 the issue is with datetime2(0) but datetime2(x) works.

This is not specific to the connector itself and a PR will be made to Spark for a fix. We will update this issue once that is created

0reactions

luxu1-mscommented, Jul 1, 2021

@chopraarjun I think it might be another issue there. If your date format only has 3 digits of microseconds, use both datetime and datetimme2 will not raise errors related to #39. Can you please open a new issue and add details about your environment and repro, thank you.

Top Results From Across the Web

pyspark to_timestamp does not include milliseconds

Reason pyspark to_timestamp parses only till seconds, while TimestampType have the ability to hold milliseconds.

Migration Guide: SQL, Datasets and DataFrame - Apache Spark

A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and...

Configuration - Spark 3.1.2 Documentation

SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through...

pyspark.sql.functions.from_utc_timestamp - Apache Spark

This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as ......

pyspark.sql module - Apache Spark

createDataFrame(l).collect() [Row(_1='Alice', _2=1)] >>> spark. ... For performance reasons, Spark SQL or the external data source library it uses might ...