question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pyspark - timestamp with microseconds, causes exception on .save()

See original GitHub issue

When column contains timestamp with non 0 microseconds the .save() fails with generic “com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.” exception.

Truncation of microsecond to 0 works around the problem. The output table created by the .save() has column of “datetime” data type. Hence i presume it might be related to handling of “rounding” microseconds to satisfy precision requirements of “datetime” datatype.

env:

  • sql spark connector version 1.1,
  • spark 2.4.5
  • databricks 6.4 runtime.

how to reproduce:

batchTimestamp = datetime.now()

#
# uncomment to truncate milliseconds, only truncation to 0 works
#batchTimestamp = batchTimestamp.replace(microsecond = 0)

print(batchTimestamp.isoformat(sep=' '))

df = spark \
  .createDataFrame([("a", 1), ("b", 2), ("c",  3)], ["Col1", "Col2"]) \
  .withColumn('ts', lit(batchTimestamp))

df.show()

df \
  .write \
  .format("com.microsoft.sqlserver.jdbc.spark") \
  .mode("overwrite") \
  .option("url", sql_url) \
  .option("dbtable", 'test_table') \
  .option("user", sql_username) \
  .option("password", sql_password) \
  .save()

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
rajmera3commented, Mar 1, 2021

As @shivsood mentioned this issue occurs due to the mapping of timestamp to DateTime instead of DateTime2. As mentioned in #83 the issue is with datetime2(0) but datetime2(x) works.

This is not specific to the connector itself and a PR will be made to Spark for a fix. We will update this issue once that is created

0reactions
luxu1-mscommented, Jul 1, 2021

@chopraarjun I think it might be another issue there. If your date format only has 3 digits of microseconds, use both datetime and datetimme2 will not raise errors related to #39. Can you please open a new issue and add details about your environment and repro, thank you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pyspark to_timestamp does not include milliseconds
Reason pyspark to_timestamp parses only till seconds, while TimestampType have the ability to hold milliseconds.
Read more >
Migration Guide: SQL, Datasets and DataFrame - Apache Spark
A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and...
Read more >
Configuration - Spark 3.1.2 Documentation
SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through...
Read more >
pyspark.sql.functions.from_utc_timestamp - Apache Spark
This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as ......
Read more >
pyspark.sql module - Apache Spark
createDataFrame(l).collect() [Row(_1='Alice', _2=1)] >>> spark. ... For performance reasons, Spark SQL or the external data source library it uses might ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found