question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Internal Error" : com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: INTERNAL: request failed: internal error

See original GitHub issue

Hello All,

I am trying to read table metadata from BigQuery using below code.

Spark BigQuery Lib

spark-bigquery-with-dependencies_2.11-0.15.1-beta.jar
spark
.read
.bigquery("big-query-200217.132235582.__TABLES__")
.show(false)

I am getting below exception & Any kind of help is highly appreciated.

com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.InternalException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: INTERNAL: request failed: internal error
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:67)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
	at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1074)
	at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
	at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
	at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
	at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:563)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
		at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57)
		at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)
		at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.storage.v1.BigQueryReadClient.createReadSession(BigQueryReadClient.java:230)
		at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation.buildScan(DirectBigQueryRelation.scala:134)
		at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$11.apply(DataSourceStrategy.scala:277)
		at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$11.apply(DataSourceStrategy.scala:277)
		at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:321)
		at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:320)
		at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:401)
		at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:316)
		at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:273)
		at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
		at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
		at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
		at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
		at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
		at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
		at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
		at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
		at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
		at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
		at scala.collection.Iterator$class.foreach(Iterator.scala:893)
		at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
		at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
		at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
		at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74)
		at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66)
		at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
		at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
		at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
		at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
		at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
		at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
		at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
		at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2832)
		at org.apache.spark.sql.Dataset.head(Dataset.scala:2153)
		at org.apache.spark.sql.Dataset.take(Dataset.scala:2366)
		at org.apache.spark.sql.Dataset.showString(Dataset.scala:245)
		at org.apache.spark.sql.Dataset.show(Dataset.scala:646)
		at org.apache.spark.sql.Dataset.show(Dataset.scala:623)
		
Caused by: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: INTERNAL: request failed: internal error
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.Status.asRuntimeException(Status.java:535)
	... 17 more

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:19 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
chajathcommented, Jul 19, 2021

@kmjung I’m in the same boat. Having trouble querying project_id.data_set.INFORMATION_SCHEMA.TABLES since it breaks some hard-wired assumption about the table naming. I get the following error:

[error] {
[error]   "code" : 400,
[error]   "errors" : [ {
[error]     "domain" : "global",
[error]     "message" : "Invalid project ID 'project_id:data_set'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.",
[error]     "reason" : "invalid"
[error]   } ],
[error]   "message" : "Invalid project ID 'project_id:data_set'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.",
[error]   "status" : "INVALID_ARGUMENT"
[error] }

Is there a way to work around this to query INFORMATION_SCHEMA.TABLES ?

0reactions
davidrabinowitzcommented, Jul 21, 2021

The connector has switched to use Apache Arrow as the default wire format as it is more performant. As the support for this class had been added in spark 2.3, you can try to add the following configuration to your spark application:

spark.conf.set("readDataFormat", "AVRO")
Read more comments on GitHub >

github_iconTop Results From Across the Web

request failed: internal error' when retrieving __TABLES__ ...
Notice that the __TABLES__ is not an actual BigQuery table, but a view to its metadata. A way to overcome this is to...
Read more >
Error messages | BigQuery - Google Cloud
Error message HTTP code Description stopped 200 This status code returns when a job is canceled. timeout 400 The job timed out.
Read more >
Partition elimination not working from SparkSQL when using ...
StatusRuntimeException : INVALID_ARGUMENT: request failed: Cannot query over table ... at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.
Read more >
1622320 - bgbb_pred_dataproc subdag failing after changing project
InternalException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: INTERNAL: request failed: internal error was a transient error ...
Read more >
Connection finished with error io.grpc.StatusRuntimeException
Jul 06, 2022 7:04:40 PM com.google.cloud.bigquery.storage.v1.StreamWriter close. INFO: User closing stream: projects/.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found