question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

K8s launcher fails via job service

See original GitHub issue

Expected Behavior

The following cell from the minimal_rid_hailing.ipynb notebook should work using the k8s spark launcher, via the job service:

# get_historical_features will return immediately once the Spark job has been submitted succesfully.
job = client.get_historical_features(feature_refs=[
    "driver_statistics:avg_daily_trips", "driver_statistics:conv_rate",
    "driver_statistics:acc_rate", "driver_trips:trips_today"
],
                                     entity_source=entities_with_timestamp)

Current Behavior

# get_historical_features will return immediately once the Spark job has been submitted succesfully.
job = client.get_historical_features(feature_refs=[
    "driver_statistics:avg_daily_trips", "driver_statistics:conv_rate",
    "driver_statistics:acc_rate", "driver_trips:trips_today"
],
                                     entity_source=entities_with_timestamp)
---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
<ipython-input-40-43e5c2d3cdd4> in <module>
      4     "driver_statistics:acc_rate", "driver_trips:trips_today"
      5 ],
----> 6                                      entity_source=entities_with_timestamp)

~/.local/lib/python3.7/site-packages/feast/client.py in get_historical_features(self, feature_refs, entity_source, output_location)
   1069                     output_location=output_location,
   1070                 ),
-> 1071                 **self._extra_grpc_params(),
   1072             )
   1073             return RemoteRetrievalJob(

/usr/local/lib/python3.7/dist-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    824         state, call, = self._blocking(request, timeout, metadata, credentials,
    825                                       wait_for_ready, compression)
--> 826         return _end_unary_response_blocking(state, call, False, None)
    827 
    828     def with_call(self,

/usr/local/lib/python3.7/dist-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    727             return state.response
    728     else:
--> 729         raise _InactiveRpcError(state)
    730 
    731 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception calling application: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 29 Jan 2021 19:32:55 GMT', 'Transfer-Encoding': 'chunked'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"SparkApplication in version \"v1beta2\" cannot be handled as a SparkApplication: unmarshalerDecoder: Object 'Kind' is missing in '{\"metadata\": {\"labels\": {\"feast.dev/jobid\": \"feast-a6xrisxw\", \"feast.dev/type\": \"HISTORICAL_RETRIEVAL_JOB\"}, \"name\": \"feast-a6xrisxw\", \"namespace\": \"feast-dev\"}, \"spec\": {\"mainApplicationFile\": \"s3a://tmp-data-viaduct-ai/feast/staging/f61c0f705cd03fa561baf45da451f2b2b970c5de51f39920909f63ebabc6ac37.py\", \"arguments\": [\"--feature-tables\", \"W3siZmVhdHVyZXMiOiBbeyJuYW1lIjogImNvbnZfcmF0ZSIsICJ0eXBlIjogIkZMT0FUIn0sIHsibmFtZSI6ICJhdmdfZGFpbHlfdHJpcHMiLCAidHlwZSI6ICJJTlQzMiJ9LCB7Im5hbWUiOiAiYWNjX3JhdGUiLCAidHlwZSI6ICJGTE9BVCJ9XSwgInByb2plY3QiOiAiZGVmYXVsdCIsICJuYW1lIjogImRyaXZlcl9zdGF0aXN0aWNzIiwgImVudGl0aWVzIjogW3sibmFtZSI6ICJkcml2ZXJfaWQiLCAidHlwZSI6ICJJTlQ2NCJ9XSwgIm1heF9hZ2UiOiBudWxsLCAibGFiZWxzIjoge319LCB7ImZlYXR1cmVzIjogW3sibmFtZSI6ICJ0cmlwc190b2RheSIsICJ0eXBlIjogIklOVDMyIn1dLCAicHJvamVjdCI6ICJkZWZhdWx0IiwgIm5hbWUiOiAiZHJpdmVyX3RyaXBzIiwgImVudGl0aWVzIjogW3sibmFtZSI6ICJkcml2ZXJfaWQiLCAidHlwZSI6ICJJTlQ2NCJ9XSwgIm1heF9hZ2UiOiBudWxsLCAibGFiZWxzIjoge319XQ==\", \"--feature-tables-sources\", \"W3siZmlsZSI6IHsiZmllbGRfbWFwcGluZyI6IHt9LCAiZXZlbnRfdGltZXN0YW1wX2NvbHVtbiI6ICJkYXRldGltZSIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiY3JlYXRlZCIsICJkYXRlX3BhcnRpdGlvbl9jb2x1bW4iOiAiZGF0ZSIsICJwYXRoIjogInMzOi8vdG1wLWRhdGEtdmlhZHVjdC1haS9mZWFzdC9zdGFnaW5nL3Rlc3RfZGF0YS9kcml2ZXJfc3RhdGlzdGljcyIsICJmb3JtYXQiOiB7Impzb25fY2xhc3MiOiAiUGFycXVldEZvcm1hdCJ9fX0sIHsiZmlsZSI6IHsiZmllbGRfbWFwcGluZyI6IHt9LCAiZXZlbnRfdGltZXN0YW1wX2NvbHVtbiI6ICJkYXRldGltZSIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiY3JlYXRlZCIsICJkYXRlX3BhcnRpdGlvbl9jb2x1bW4iOiAiZGF0ZSIsICJwYXRoIjogInMzOi8vdG1wLWRhdGEtdmlhZHVjdC1haS9mZWFzdC9zdGFnaW5nL3Rlc3RfZGF0YS9kcml2ZXJfdHJpcHMiLCAiZm9ybWF0IjogeyJqc29uX2NsYXNzIjogIlBhcnF1ZXRGb3JtYXQifX19XQ==\", \"--entity-source\", \"eyJmaWxlIjogeyJmaWVsZF9tYXBwaW5nIjoge30sICJldmVudF90aW1lc3RhbXBfY29sdW1uIjogImV2ZW50X3RpbWVzdGFtcCIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiIiwgImRhdGVfcGFydGl0aW9uX2NvbHVtbiI6ICIiLCAicGF0aCI6ICJzM2E6Ly90bXAtZGF0YS12aWFkdWN0LWFpL2ZlYXN0L3N0YWdpbmcvNzU2ODE3ZTUtZDc0Mi00NzQwLWI4NjUtMTQ5MjhiMDgwNTc1IiwgImZvcm1hdCI6IHsianNvbl9jbGFzcyI6ICJQYXJxdWV0Rm9ybWF0In19fQ==\", \"--destination\", \"eyJmb3JtYXQiOiAicGFycXVldCIsICJwYXRoIjogInMzYTovL3RtcC1kYXRhLXZpYWR1Y3QtYWkvZmVhc3QvaGlzdG9yaWNhbC82YTdiMDMzZS02NThmLTQ1Y2UtODUzZi1kZDNhODI4ZDgxMzMifQ==\"], \"sparkConf\": {\"dev.feast.outputuri\": \"s3a://tmp-data-viaduct-ai/feast/historical/6a7b033e-658f-45ce-853f-dd3a828d8133\"}}}', error found in #10 byte of ...|8d8133\"}}}|..., bigger context ...|istorical/6a7b033e-658f-45ce-853f-dd3a828d8133\"}}}|...","reason":"BadRequest","code":400}

"
	debug_error_string = "{"created":"@1611948775.375230564","description":"Error received from peer ipv4:100.64.125.189:6568","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"Exception calling application: (400)\nReason: Bad Request\nHTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 29 Jan 2021 19:32:55 GMT', 'Transfer-Encoding': 'chunked'})\nHTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"SparkApplication in version \"v1beta2\" cannot be handled as a SparkApplication: unmarshalerDecoder: Object 'Kind' is missing in '{\"metadata\": {\"labels\": {\"feast.dev/jobid\": \"feast-a6xrisxw\", \"feast.dev/type\": \"HISTORICAL_RETRIEVAL_JOB\"}, \"name\": \"feast-a6xrisxw\", \"namespace\": \"feast-dev\"}, \"spec\": {\"mainApplicationFile\": \"s3a://tmp-data-viaduct-ai/feast/staging/f61c0f705cd03fa561baf45da451f2b2b970c5de51f39920909f63ebabc6ac37.py\", \"arguments\": [\"--feature-tables\", \"W3siZmVhdHVyZXMiOiBbeyJuYW1lIjogImNvbnZfcmF0ZSIsICJ0eXBlIjogIkZMT0FUIn0sIHsibmFtZSI6ICJhdmdfZGFpbHlfdHJpcHMiLCAidHlwZSI6ICJJTlQzMiJ9LCB7Im5hbWUiOiAiYWNjX3JhdGUiLCAidHlwZSI6ICJGTE9BVCJ9XSwgInByb2plY3QiOiAiZGVmYXVsdCIsICJuYW1lIjogImRyaXZlcl9zdGF0aXN0aWNzIiwgImVudGl0aWVzIjogW3sibmFtZSI6ICJkcml2ZXJfaWQiLCAidHlwZSI6ICJJTlQ2NCJ9XSwgIm1heF9hZ2UiOiBudWxsLCAibGFiZWxzIjoge319LCB7ImZlYXR1cmVzIjogW3sibmFtZSI6ICJ0cmlwc190b2RheSIsICJ0eXBlIjogIklOVDMyIn1dLCAicHJvamVjdCI6ICJkZWZhdWx0IiwgIm5hbWUiOiAiZHJpdmVyX3RyaXBzIiwgImVudGl0aWVzIjogW3sibmFtZSI6ICJkcml2ZXJfaWQiLCAidHlwZSI6ICJJTlQ2NCJ9XSwgIm1heF9hZ2UiOiBudWxsLCAibGFiZWxzIjoge319XQ==\", \"--feature-tables-sources\", \"W3siZmlsZSI6IHsiZmllbGRfbWFwcGluZyI6IHt9LCAiZXZlbnRfdGltZXN0YW1wX2NvbHVtbiI6ICJkYXRldGltZSIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiY3JlYXRlZCIsICJkYXRlX3BhcnRpdGlvbl9jb2x1bW4iOiAiZGF0ZSIsICJwYXRoIjogInMzOi8vdG1wLWRhdGEtdmlhZHVjdC1haS9mZWFzdC9zdGFnaW5nL3Rlc3RfZGF0YS9kcml2ZXJfc3RhdGlzdGljcyIsICJmb3JtYXQiOiB7Impzb25fY2xhc3MiOiAiUGFycXVldEZvcm1hdCJ9fX0sIHsiZmlsZSI6IHsiZmllbGRfbWFwcGluZyI6IHt9LCAiZXZlbnRfdGltZXN0YW1wX2NvbHVtbiI6ICJkYXRldGltZSIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiY3JlYXRlZCIsICJkYXRlX3BhcnRpdGlvbl9jb2x1bW4iOiAiZGF0ZSIsICJwYXRoIjogInMzOi8vdG1wLWRhdGEtdmlhZHVjdC1haS9mZWFzdC9zdGFnaW5nL3Rlc3RfZGF0YS9kcml2ZXJfdHJpcHMiLCAiZm9ybWF0IjogeyJqc29uX2NsYXNzIjogIlBhcnF1ZXRGb3JtYXQifX19XQ==\", \"--entity-source\", \"eyJmaWxlIjogeyJmaWVsZF9tYXBwaW5nIjoge30sICJldmVudF90aW1lc3RhbXBfY29sdW1uIjogImV2ZW50X3RpbWVzdGFtcCIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiIiwgImRhdGVfcGFydGl0aW9uX2NvbHVtbiI6ICIiLCAicGF0aCI6ICJzM2E6Ly90bXAtZGF0YS12aWFkdWN0LWFpL2ZlYXN0L3N0YWdpbmcvNzU2ODE3ZTUtZDc0Mi00NzQwLWI4NjUtMTQ5MjhiMDgwNTc1IiwgImZvcm1hdCI6IHsianNvbl9jbGFzcyI6ICJQYXJxdWV0Rm9ybWF0In19fQ==\", \"--destination\", \"eyJmb3JtYXQiOiAicGFycXVldCIsICJwYXRoIjogInMzYTovL3RtcC1kYXRhLXZpYWR1Y3QtYWkvZmVhc3QvaGlzdG9yaWNhbC82YTdiMDMzZS02NThmLTQ1Y2UtODUzZi1kZDNhODI4ZDgxMzMifQ==\"], \"sparkConf\": {\"dev.feast.outputuri\": \"s3a://tmp-data-viaduct-ai/feast/historical/6a7b033e-658f-45ce-853f-dd3a828d8133\"}}}', error found in #10 byte of ...|8d8133\"}}}|..., bigger context ...|istorical/6a7b033e-658f-45ce-853f-dd3a828d8133\"}}}|...","reason":"BadRequest","code":400}\n\n","grpc_status":2}"
>

Steps to reproduce

pip install feast==0.9.0

Run the minimal_ride_hailing.ipynb up to that cell calling get_historical_features

env variable configuration for the job service:

    FEAST_SPARK_LAUNCHER: "k8s"
    FEAST_SPARK_STAGING_LOCATION: "s3a://my-bucet/feast/spark-staging/"
    FEAST_SPARK_K8S_NAMESPACE: "feast-dev"

Specifications

  • Version: 0.9.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
woopcommented, Feb 16, 2021

@beatgeek The SparkOp is unfortunately not providing very descriptive exceptions yet. Definitely something to work on. It’s worth debugging your actual operator by looking at the jobs it creates. It may be related to a missing service account, for example https://github.com/feast-dev/feast/tree/v0.9.3/tests

1reaction
woopcommented, Jan 30, 2021

Thanks for raising this issue @jpugliesi. I’ll try to reproduce this problem today.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubernetes Jobs | Use Cases, Scheduling, and Failure
Learn more about Kubernetes best practices and job cases. This article will even teach you how to create kubernetes jobs and how to...
Read more >
Jobs | Kubernetes
spec.template.spec.restartPolicy = "Never" . When a Pod fails, then the Job controller starts a new Pod. This means that your application ...
Read more >
3 Kubernetes Plugin | RStudio Job Launcher 2022.06.0-daily+ ...
The Kubernetes Job Launcher Plugin provides the capability to launch executables on a Kubernetes cluster. 3.1 Configuration. It is recommended not to change...
Read more >
SAS Launcher Service
The Launcher service is a SAS Viya microservice that provides API ... The pods are launched through a Kubernetes job, but do not...
Read more >
The Kubernetes executor for GitLab Runner
A Runtime class to use for all created pods. If the feature is unsupported by the cluster, jobs exit or fail. pull_policy, Specify...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found