Connection Refused Error when writing to BigQuery with gcpAccessToken
See original GitHub issueWhile writing data to BigQuery from on prem spark cluster. Facing a Connection refused.
This message seems to be trying to get a Credential from the GCE metadata server (which is of course not running on the on-prem machines). Should the gcpAccessToken
option be used to create the credential instead of inferring credentials from GCE metadata server? It seems this happens when trying to make a request to GCS specifically. Is this user error or a bug in how the connector auth’s requests to GCS?
Usage:
df.write.format("bigquery")
.option("gcpAccessToken", gcpAccessToken)
.option("temporaryGcsBucket","[redacted bucket id]")
.option("parentProject", projectName)
.save(s"$projectName.[redacted dataset id ].[redacted table id]")
Error:
21/02/26 15:54:37 DEBUG CredentialFactory: getCredentialFromMetadataServiceAccount()
21/02/26 15:54:38 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
21/02/26 15:54:38 WARN HttpTransport: exception thrown while executing request
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:168)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:218)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:75)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1869)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1058)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1021)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.<init>(BigQueryWriteHelper.scala:62)
at com.google.cloud.spark.bigquery.BigQueryInsertableRelation.insert(BigQueryInsertableRelation.scala:41)
at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelation(BigQueryRelationProvider.scala:108)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:42)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:47)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:49)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:51)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:53)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:55)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:57)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:59)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:61)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:63)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:65)
at $line29.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:67)
at $line29.$read$$iw$$iw$$iw$$iw.<init>(<console>:69)
at $line29.$read$$iw$$iw$$iw.<init>(<console>:71)
at $line29.$read$$iw$$iw.<init>(<console>:73)
at $line29.$read$$iw.<init>(<console>:75)
at $line29.$read.<init>(<console>:77)
at $line29.$read$.<init>(<console>:81)
at $line29.$read$.<clinit>(<console>)
at $line29.$eval$.$print$lzycompute(<console>:7)
at $line29.$eval$.$print(<console>:6)
at $line29.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
at org.apache.spark.repl.Main$.doMain(Main.scala:76)
at org.apache.spark.repl.Main$.main(Main.scala:56)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Error messages | BigQuery - Google Cloud
Error message HTTP code Description
stopped 200 This status code returns when a job is canceled.
timeout 400 The job timed out.
Read more >From remote machine getting connection refused for GCP ...
Exception in thread "main" com.google.cloud.bigquery.BigQueryException: Error getting access token for service account: connect timed out at ...
Read more >spark-bigquery-connector/README.md at master - GitHub
The connector supports reading Google BigQuery tables into Spark's DataFrames, and writing DataFrames back into BigQuery. This is done by using the Spark...
Read more >Tableau Cloud Google BigQuery Data Connection Failing with ...
"The Google BigQuery service reported an unrecognized error when processing this request. VPC Service Controls: Request is prohibited by organization's policy.
Read more >Google BigQuery | Databricks on AWS
This article describes how to read from and write to Google BigQuery tables in Databricks. You must connect to BigQuery using key-based ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
My understanding of what is going on in this stack trace is as follows:
gcpAccessToken
in your spark code this is actually only used to auth requests to BigQuery (not GCS) The GCS connector is configured at the cluster level with service account key (See INSTALL.md and CONFIGURATION.md for more details).If this configuration for authentication of GCS requests by the GCS connector is missing, then the libraries fall back to common GCP defaults of finding credentials automatically (looking for GOOGLE_APPLICATION_CREDENTIALS environment variable and finally falling back to making a call to the GCE metadata server to get a short term token for that VM’s SA. This GCE Metadata server runs like a side car on every GCP VM locally). Of course your on-prem machines are not running the GCE Metadata server which is why you get this connection refused.
Looking at the GCS Conenctor config properties it looks like this can be solved with
fs.gs.auth.access.token.provider.impl
@davidrabinowitz could the connector be enhanced so when the user passes the gcpAccessToken option the connector sets fs.gs.auth.access.token.provider.impl on the spark application and have a static token provider that just returns the value the user passed as gcpAccessToken to auth the GCS requests? This seems to be the user expectation.
At minimum we should improve the error message in this scenario.
following @jaketf suggestion managed to solve my issue (#471) by using custom GCSAccessTokenProvider class
first setting following spark.hadoop variables.
then in the GCSAccessTokenProvider reading
credentialsFile
and initializing GoogleCredentials which used to generate token for sparkfull implementation is here