question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Connecting Superset to Hive with Kerberos failing

See original GitHub issue

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if any
  • I have reproduced the issue with at least the latest released version of superset
  • I have checked the issue tracker for the same issue and I haven’t found one similar

Superset version

0.22.1

I am trying to connect Superset to a Kerberised Hive cluster, however, that is failing with the Kerberos error.

2018-05-08 01:59:37,051:ERROR:root:Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: /tmp/krb5cc_0)) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/superset/views/core.py", line 1507, in testconn engine.connect() File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2091, in connect return self._connection_cls(self, **kwargs) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 90, in __init__ if connection is not None else engine.raw_connection() File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2177, in raw_connection self.pool.unique_connection, _connection) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect return fn() File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 328, in unique_connection return _ConnectionFairy._checkout(self) File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 766, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 516, in checkout rec = pool._do_get() File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1138, in _do_get self._dec_overflow() File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1135, in _do_get return self._create_connection() File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 333, in _create_connection return _ConnectionRecord(self) File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 461, in __init__ self.__connect(first_connect_check=True) File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 651, in __connect connection = pool._invoke_creator(self) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 105, in connect return dialect.connect(*cargs, **cparams) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 393, in connect return self.dbapi.connect(*cargs, **cparams) File "/usr/lib/python2.7/site-packages/pyhive/hive.py", line 64, in connect return Connection(*args, **kwargs) File "/usr/lib/python2.7/site-packages/pyhive/hive.py", line 159, in __init__ self._transport.open() File "/usr/lib/python2.7/site-packages/thrift_sasl/__init__.py", line 79, in open message=("Could not start SASL: %s" % self.sasl.getError())) TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: /tmp/krb5cc_0))

Below is the setup and things I have tried:

  1. Installed Superset following the docs in AWS EC2 instance.
  2. Started the Superset web server as root on port 80.
  3. Installed necessary Kerberos packages.
  4. Created user(let us call this X), got keytab, was able to do kinit.
  5. Creating a new data source from the UI with the below connection string fails: hive://xx.xx.xx.xx:10000/default?auth=KERBEROS&kerberos_service_name=hive
  6. Have tried with “Impersonate the Logged on user” and without it.

I am able to connect to hive from the Python shell using the user X and SQLAlchemy:

import sqlalchemy engine = sqlalchemy.create_engine('hive://xx.xx.xx.xx:10000/default', connect_args={'auth': 'KERBEROS','kerberos_service_name': 'hive'}) c = engine.connect() result = c.execute('SELECT count(*) from my_schema.my_table') result.fetchall() [(5132,)]

By the looks of the error message, it seems to me that is trying to look for a Kerberos credential cache for the user root. No Kerberos credentials available (default cache: /tmp/krb5cc_0))

Am I missing something here?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mayukhghoshmecommented, May 9, 2018

Okay, I figured it out. As I was mentioning earlier, it was trying to look for the Kerberos cache for the user ‘root’. No Kerberos credentials available (default cache: /tmp/krb5cc_0. Got the hint from the file name since Kerberos cache tickets are usually appended with the gid of the user. In this case it is _0 which is for ‘root’.

So the resolution is that,

  1. All the below steps should be performed by a user who can authenticate itself against KDC and not some user like ‘root’.

image

  1. Using Impyla seemed to be more elegant than PyHive. Install impyla using pip as root.

pip install impyla

  1. You need to have the keytab file for the user used in step1. Do a kinit using the same.

  2. Configure the connection string in Superset something like this.

SQLAlchemy URI = impala://<hive_host>:10000/default

    "metadata_params": {},
    "engine_params": {
		"connect_args": {
		    "auth_mechanism": "GSSAPI",
		    "kerberos_service_name":"hive"
		 }
}
}
  1. If something goes wrong regarding the packages, you can connect to the Superset server and use the Python shell to ensure that the modules are working fine, this way:
from sqlalchemy import *
engine = sqlalchemy.create_engine('impala://<hive_host>:10000/default', connect_args={'auth_mechanism': 'GSSAPI','kerberos_service_name': 'hive'})
c = engine.connect()
result = c.execute('SELECT count(*) from my_table')
print result.fetchall()
0reactions
juthikashenoycommented, Jun 7, 2018

ok - I will do that . How about the jdbc+hive dialect , is it supposed to work ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Connecting Superset to Hive with Kerberos failing · Issue #4951
I am trying to connect Superset to a Kerberised Hive cluster, however, that is failing with the Kerberos error.
Read more >
superset is not integrated with kerberized hive
hi iam using python 3.7 and superset service from hdp 3.x. ERROR: {"error": "Connection failed!\n\nThe error message returned was:\nPassword ...
Read more >
configure a Kerberized Hive datasource - Google Groups
You'll have to find a way to get a SqlAlchemy connection with Kerberos authentication for Hive. Superset should be exposing all the hooks...
Read more >
Kerberos Setup and Troubleshooting - Dremio docs
Resolution: The Keytab either has wrong permissions or ownership. Even the keytab has the right permission, check all directories above the file recursively...
Read more >
How to specifiy hive parameters in Apache Superset?
I found the answer. While creating the data source in superset. In the Extra textbox, you can specify the hive parameters in JSON...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found