pyspark - timestamp with microseconds, causes exception on .save()
See original GitHub issueWhen column contains timestamp with non 0 microseconds the .save() fails with generic “com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.” exception.
Truncation of microsecond to 0 works around the problem. The output table created by the .save() has column of “datetime” data type. Hence i presume it might be related to handling of “rounding” microseconds to satisfy precision requirements of “datetime” datatype.
env:
- sql spark connector version 1.1,
- spark 2.4.5
- databricks 6.4 runtime.
how to reproduce:
batchTimestamp = datetime.now()
#
# uncomment to truncate milliseconds, only truncation to 0 works
#batchTimestamp = batchTimestamp.replace(microsecond = 0)
print(batchTimestamp.isoformat(sep=' '))
df = spark \
.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["Col1", "Col2"]) \
.withColumn('ts', lit(batchTimestamp))
df.show()
df \
.write \
.format("com.microsoft.sqlserver.jdbc.spark") \
.mode("overwrite") \
.option("url", sql_url) \
.option("dbtable", 'test_table') \
.option("user", sql_username) \
.option("password", sql_password) \
.save()
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (3 by maintainers)
Top Results From Across the Web
pyspark to_timestamp does not include milliseconds
Reason pyspark to_timestamp parses only till seconds, while TimestampType have the ability to hold milliseconds.
Read more >Migration Guide: SQL, Datasets and DataFrame - Apache Spark
A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and...
Read more >Configuration - Spark 3.1.2 Documentation
SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through...
Read more >pyspark.sql.functions.from_utc_timestamp - Apache Spark
This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as ......
Read more >pyspark.sql module - Apache Spark
createDataFrame(l).collect() [Row(_1='Alice', _2=1)] >>> spark. ... For performance reasons, Spark SQL or the external data source library it uses might ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As @shivsood mentioned this issue occurs due to the mapping of timestamp to DateTime instead of DateTime2. As mentioned in #83 the issue is with datetime2(0) but datetime2(x) works.
This is not specific to the connector itself and a PR will be made to Spark for a fix. We will update this issue once that is created
@chopraarjun I think it might be another issue there. If your date format only has 3 digits of microseconds, use both datetime and datetimme2 will not raise errors related to #39. Can you please open a new issue and add details about your environment and repro, thank you.