question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Connection Refused Error when writing to BigQuery with gcpAccessToken

See original GitHub issue

While writing data to BigQuery from on prem spark cluster. Facing a Connection refused. This message seems to be trying to get a Credential from the GCE metadata server (which is of course not running on the on-prem machines). Should the gcpAccessToken option be used to create the credential instead of inferring credentials from GCE metadata server? It seems this happens when trying to make a request to GCS specifically. Is this user error or a bug in how the connector auth’s requests to GCS?

Usage:

df.write.format("bigquery")

.option("gcpAccessToken", gcpAccessToken)

.option("temporaryGcsBucket","[redacted bucket id]")

.option("parentProject", projectName)

.save(s"$projectName.[redacted dataset id ].[redacted table id]")

Error:

21/02/26 15:54:37 DEBUG CredentialFactory: getCredentialFromMetadataServiceAccount()

21/02/26 15:54:38 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms

21/02/26 15:54:38 WARN HttpTransport: exception thrown while executing request

java.net.ConnectException: Connection refused (Connection refused)

                at java.net.PlainSocketImpl.socketConnect(Native Method)

                at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

                at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

                at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

                at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

                at java.net.Socket.connect(Socket.java:589)

                at sun.net.NetworkClient.doConnect(NetworkClient.java:175)

                at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)

                at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)

                at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)

                at sun.net.www.http.HttpClient.New(HttpClient.java:339)

                at sun.net.www.http.HttpClient.New(HttpClient.java:357)

                at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)

                at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)

                at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)

                at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)

                at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)

                at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)

                at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:168)

                at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489)

                at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:218)

                at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:75)

                at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1869)

                at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1058)

                at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1021)

                at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)

                at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)

                at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)

                at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)

                at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)

                at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)

                at com.google.cloud.spark.bigquery.BigQueryWriteHelper.<init>(BigQueryWriteHelper.scala:62)

                at com.google.cloud.spark.bigquery.BigQueryInsertableRelation.insert(BigQueryInsertableRelation.scala:41)

                at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelation(BigQueryRelationProvider.scala:108)

                at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)

                at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)

                at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)

                at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)

                at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)

                at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)

                at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)

                at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

                at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)

                at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)

                at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)

                at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)

                at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)

                at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)

                at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)

                at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)

                at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)

                at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)

                at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:42)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:47)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:49)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:51)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:53)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:55)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:57)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:59)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:61)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:63)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:65)

                at $line29.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:67)

                at $line29.$read$$iw$$iw$$iw$$iw.<init>(<console>:69)

                at $line29.$read$$iw$$iw$$iw.<init>(<console>:71)

                at $line29.$read$$iw$$iw.<init>(<console>:73)

                at $line29.$read$$iw.<init>(<console>:75)

                at $line29.$read.<init>(<console>:77)

                at $line29.$read$.<init>(<console>:81)

                at $line29.$read$.<clinit>(<console>)

                at $line29.$eval$.$print$lzycompute(<console>:7)

                at $line29.$eval$.$print(<console>:6)

                at $line29.$eval.$print(<console>)

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

                at java.lang.reflect.Method.invoke(Method.java:498)

                at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)

                at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)

                at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)

                at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)

                at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)

                at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)

                at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)

                at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)

                at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)

                at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)

                at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)

                at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)

                at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)

                at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923)

                at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)

                at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)

                at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)

                at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)

                at org.apache.spark.repl.Main$.doMain(Main.scala:76)

                at org.apache.spark.repl.Main$.main(Main.scala:56)

                at org.apache.spark.repl.Main.main(Main.scala)

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

                at java.lang.reflect.Method.invoke(Method.java:498)

                at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

                at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)

                at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)

                at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)

                at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)

                at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jaketfcommented, Mar 4, 2021

My understanding of what is going on in this stack trace is as follows:

  1. When specifying a temporary bucket for writing to BQ the spark-bigquery-connector relies on the GCS connector to make requests to GCS (including verifying that this temporary bucket exists).
  2. When you supply gcpAccessToken in your spark code this is actually only used to auth requests to BigQuery (not GCS) The GCS connector is configured at the cluster level with service account key (See INSTALL.md and CONFIGURATION.md for more details).

If this configuration for authentication of GCS requests by the GCS connector is missing, then the libraries fall back to common GCP defaults of finding credentials automatically (looking for GOOGLE_APPLICATION_CREDENTIALS environment variable and finally falling back to making a call to the GCE metadata server to get a short term token for that VM’s SA. This GCE Metadata server runs like a side car on every GCP VM locally). Of course your on-prem machines are not running the GCE Metadata server which is why you get this connection refused.

Looking at the GCS Conenctor config properties it looks like this can be solved with fs.gs.auth.access.token.provider.impl

The implementation of the AccessTokenProvider interface used for GCS Connector.

@davidrabinowitz could the connector be enhanced so when the user passes the gcpAccessToken option the connector sets fs.gs.auth.access.token.provider.impl on the spark application and have a static token provider that just returns the value the user passed as gcpAccessToken to auth the GCS requests? This seems to be the user expectation.

At minimum we should improve the error message in this scenario.

0reactions
ismailsimsekcommented, Nov 8, 2021

following @jaketf suggestion managed to solve my issue (#471) by using custom GCSAccessTokenProvider class

first setting following spark.hadoop variables.

sparkconf.set("spark.hadoop.fs.gs.auth.access.token.provider.impl", "my.spark.GCSAccessTokenProvider");
sparkconf.set("spark.hadoop.credentialsFile", "/location/to/cred.json"));

then in the GCSAccessTokenProvider reading credentialsFile and initializing GoogleCredentials which used to generate token for spark

  @Override
  public void setConf(Configuration config) {
    this.config = config;
    File credFile = new File(config.get("credentialsFile"));
    this.googleCredentials = GoogleCredentials.fromStream(new FileInputStream(credFile));
  }

full implementation is here

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error messages | BigQuery - Google Cloud
Error message HTTP code Description stopped 200 This status code returns when a job is canceled. timeout 400 The job timed out.
Read more >
From remote machine getting connection refused for GCP ...
Exception in thread "main" com.google.cloud.bigquery.BigQueryException: Error getting access token for service account: connect timed out at ...
Read more >
spark-bigquery-connector/README.md at master - GitHub
The connector supports reading Google BigQuery tables into Spark's DataFrames, and writing DataFrames back into BigQuery. This is done by using the Spark...
Read more >
Tableau Cloud Google BigQuery Data Connection Failing with ...
"The Google BigQuery service reported an unrecognized error when processing this request. VPC Service Controls: Request is prohibited by organization's policy.
Read more >
Google BigQuery | Databricks on AWS
This article describes how to read from and write to Google BigQuery tables in Databricks. You must connect to BigQuery using key-based ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found