question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add Spark JDBC read

See original GitHub issue

For enterprise use, I’d like to poll the extension of read methods to JDBC, given that drivers are available in the Spark Context.

Current Solutions

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

import databricks.koalas as ks

jdbc_options = dict()
jdbc_options["driver"] = "<my-driver>"

# JDBC
df = ks.DataFrame(
    spark.read.format("jdbc").options(**jdbc_options).option("dbtable", "<sql>").load()
)

# Snowflake
sf_options = dict()
df = ks.DataFrame(
    spark.read.format("net.snowflake.spark.snowflake")
    .options(**sf_options)
    .option("dbtable", "<sql>")
    .load()
)

New Solution

import databricks.koalas as ks

df = ks.read_jdbc(dbtable="<sql>", driver="<my-driver>", **options)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
vaibhavsingh007commented, Dec 10, 2020

Yes, that seems to work. I was able to create ks dataframe using OP’s method above:

df = ks.DataFrame(
    spark.read.format("net.snowflake.spark.snowflake")
    .options(**sf_options)
    .option("dbtable", "<sql>")
    .load()
)
0reactions
vaibhavsingh007commented, Dec 10, 2020

Thanks @HyukjinKwon I did check that doc before though it doesn’t say how to pass the driver. Should I just create a separate Spark session with driver config (spark.jars) and Koalas will detect that Spark session automatically instead of creating its own?

Read more comments on GitHub >

github_iconTop Results From Across the Web

JDBC To Other Databases - Spark 3.3.1 Documentation
Property Name Default Scope url (none) read/write dbtable (none) read/write query (none) read/write
Read more >
How to use JDBC source to write and read data in (Py)Spark?
Choose desired mode. Spark JDBC writer supports following modes: append : Append contents of this :class: DataFrame to existing data.
Read more >
Query databases using JDBC | Databricks on AWS
When writing to databases using JDBC, Apache Spark uses the number of partitions in memory to control parallelism. You can repartition data ...
Read more >
Read JDBC Table to Spark DataFrame
Read JDBC Table to Spark DataFrame · Step 1 – Identify the Spark Connector to use · Step 2 – Add the dependency...
Read more >
How to read and write from Database in Spark using pyspark.
Spark class `class pyspark.sql.DataFrameReader` provides the interface method to perform the jdbc specific operations. The method jdbc takes the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found