Connecting Superset to Hive with Kerberos failing
See original GitHub issueMake sure these boxes are checked before submitting your issue - thank you!
- I have checked the superset logs for python stacktraces and included it here as text if any
- I have reproduced the issue with at least the latest released version of superset
- I have checked the issue tracker for the same issue and I haven’t found one similar
Superset version
0.22.1
I am trying to connect Superset to a Kerberised Hive cluster, however, that is failing with the Kerberos error.
2018-05-08 01:59:37,051:ERROR:root:Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: /tmp/krb5cc_0))
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/superset/views/core.py", line 1507, in testconn engine.connect()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2091, in connect return self._connection_cls(self, **kwargs)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 90, in __init__ if connection is not None else engine.raw_connection()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2177, in raw_connection self.pool.unique_connection, _connection)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect return fn()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 328, in unique_connection return _ConnectionFairy._checkout(self)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 766, in _checkout fairy = _ConnectionRecord.checkout(pool)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 516, in checkout rec = pool._do_get()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1138, in _do_get self._dec_overflow()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__ compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1135, in _do_get return self._create_connection()
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 333, in _create_connection return _ConnectionRecord(self)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 461, in __init__ self.__connect(first_connect_check=True)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 651, in __connect connection = pool._invoke_creator(self)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 105, in connect return dialect.connect(*cargs, **cparams)
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 393, in connect return self.dbapi.connect(*cargs, **cparams)
File "/usr/lib/python2.7/site-packages/pyhive/hive.py", line 64, in connect return Connection(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pyhive/hive.py", line 159, in __init__ self._transport.open()
File "/usr/lib/python2.7/site-packages/thrift_sasl/__init__.py", line 79, in open message=("Could not start SASL: %s" % self.sasl.getError()))
TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: /tmp/krb5cc_0))
Below is the setup and things I have tried:
- Installed Superset following the docs in AWS EC2 instance.
- Started the Superset web server as root on port 80.
- Installed necessary Kerberos packages.
- Created user(let us call this X), got keytab, was able to do kinit.
- Creating a new data source from the UI with the below connection string fails: hive://xx.xx.xx.xx:10000/default?auth=KERBEROS&kerberos_service_name=hive
- Have tried with “Impersonate the Logged on user” and without it.
I am able to connect to hive from the Python shell using the user X and SQLAlchemy:
import sqlalchemy
engine = sqlalchemy.create_engine('hive://xx.xx.xx.xx:10000/default', connect_args={'auth': 'KERBEROS','kerberos_service_name': 'hive'})
c = engine.connect()
result = c.execute('SELECT count(*) from my_schema.my_table')
result.fetchall()
[(5132,)]
By the looks of the error message, it seems to me that is trying to look for a Kerberos credential cache for the user root. No Kerberos credentials available (default cache: /tmp/krb5cc_0))
Am I missing something here?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:9 (3 by maintainers)
Top GitHub Comments
Okay, I figured it out. As I was mentioning earlier, it was trying to look for the Kerberos cache for the user ‘root’.
No Kerberos credentials available (default cache: /tmp/krb5cc_0
. Got the hint from the file name since Kerberos cache tickets are usually appended with the gid of the user. In this case it is _0 which is for ‘root’.So the resolution is that,
impyla
using pip as root.pip install impyla
You need to have the keytab file for the user used in step1. Do a kinit using the same.
Configure the connection string in Superset something like this.
SQLAlchemy URI = impala://<hive_host>:10000/default
ok - I will do that . How about the jdbc+hive dialect , is it supposed to work ?