Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add Spark JDBC read

See original GitHub issue

For enterprise use, I’d like to poll the extension of read methods to JDBC, given that drivers are available in the Spark Context.

Current Solutions

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

import databricks.koalas as ks

jdbc_options = dict()
jdbc_options["driver"] = "<my-driver>"

# JDBC
df = ks.DataFrame(
    spark.read.format("jdbc").options(**jdbc_options).option("dbtable", "<sql>").load()
)

# Snowflake
sf_options = dict()
df = ks.DataFrame(
    spark.read.format("net.snowflake.spark.snowflake")
    .options(**sf_options)
    .option("dbtable", "<sql>")
    .load()
)

New Solution

import databricks.koalas as ks

df = ks.read_jdbc(dbtable="<sql>", driver="<my-driver>", **options)

Issue Analytics

State:
Created 3 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

vaibhavsingh007commented, Dec 10, 2020

Yes, that seems to work. I was able to create ks dataframe using OP’s method above:

df = ks.DataFrame(
    spark.read.format("net.snowflake.spark.snowflake")
    .options(**sf_options)
    .option("dbtable", "<sql>")
    .load()
)

0reactions

vaibhavsingh007commented, Dec 10, 2020

Thanks @HyukjinKwon I did check that doc before though it doesn’t say how to pass the driver. Should I just create a separate Spark session with driver config (spark.jars) and Koalas will detect that Spark session automatically instead of creating its own?