`TypeError: 'unicode' does not have the buffer interface`, using hue on Databricks clusters with SQLAlchemy interface and the Hive connector
See original GitHub issueWhen connecting to Databricks clusters from Hue using SQLAlchemy interface and the Hive connector, we received TypeError: 'unicode' does not have the buffer interface
.
After some days of debugging, we realised that the beeswax application which is installed and configured as part of hue and enabled you to perform queries on Apache Hive,
had a custom autogenerated thrift python code that allows integration with HiveServer2. Because we were using the hive connector, anytime hue was about to establish connection
to Databricks or run a statement, the custom thrift library tried to encode the SQL statements which failed with the TypeError
as below:
[15/Feb/2020 17:11:22 -0800] sql_alchemy ERROR Query Error
Traceback (most recent call last):
File "/usr/share/hue/desktop/libs/notebook/src/notebook/connectors/sql_alchemy.py", line 85, in decorator
return func(*args, **kwargs)
File "/usr/share/hue/desktop/libs/notebook/src/notebook/connectors/sql_alchemy.py", line 139, in execute
connection = engine.connect()
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2209, in connect
return self._connection_cls(self, **kwargs)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 103, in __init__
else engine.raw_connection()
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2307, in raw_connection
self.pool.unique_connection, _connection
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2276, in _wrap_pool_connect
return fn()
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 303, in unique_connection
return _ConnectionFairy._checkout(self)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 773, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
rec = pool._do_get()
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
self._dec_overflow()
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
return self._create_connection()
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
return _ConnectionRecord(self)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
self.__connect(first_connect_check=True)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 652, in __connect
connection = pool._invoke_creator(self)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 489, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/databricks_dbapi/databricks.py", line 61, in connect
return hive.connect(database=database, thrift_transport=transport)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 94, in connect
return Connection(*args, **kwargs)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 205, in __init__
cursor.execute('USE `{}`'.format(database))
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 364, in execute
response = self._connection.client.ExecuteStatement(req)
File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 298, in ExecuteStatement
self.send_ExecuteStatement(req)
File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 305, in send_ExecuteStatement
args.write(self._oprot)
File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 1882, in write
self.req.write(oprot)
File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py", line 4460, in write
oprot.writeBinary(self.statement)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 131, in writeBinary
self.trans.write(str)
File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/thrift/transport/THttpClient.py", line 142, in write
self.__wbuf.write(buf)
TypeError: 'unicode' does not have the buffer interface
[15/Feb/2020 17:11:22 -0800] decorators ERROR Error running execute
We understood clearly that line 4460
in /usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py
did not handle properly the encoding
of unicode string, when writing from string to binary; using python version 2.7.
Line 4460 in /usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py
oprot.writeString(self.statement)
Instead of:
oprot.writeString(self.statement.encode('utf-8') if sys.version_info[0] == 2 else self.statement)
To go around this problem, we upgraded the pip version which comes with Hue and then installed the databricks-dbapi[sqlalchemy] package which then installs other collected packages with a compatible and updated thrift library able to handle unicode encoding:
RUN ./build/env/bin/pip install --upgrade pip
RUN ./build/env/bin/pip install databricks-dbapi[sqlalchemy]
We go ahead to remove the native hue thrift library so that connection will fall over our newly installed thrift library.
RUN rm -rf /usr/share/hue/apps/beeswax/gen-py
The complete Dockerfile looks like this:
FROM gethue/hue:<latest-stable-hue-version>
USER root
RUN ./build/env/bin/pip install --upgrade pip
RUN ./build/env/bin/pip install databricks-dbapi[sqlalchemy]
ADD hue.ini /usr/share/hue/desktop/conf/z-hue.ini
# Remove custom hue thrift library
RUN rm -rf /usr/share/hue/apps/beeswax/gen-py
EXPOSE 8888
How our hue.ini config looked like
So we used the hive interpreter in the Hue config which PyHive extends. This is what the databricks+pyhive
dialect/driver which come by installing databricks-dbapi uses with SQLAlchemy to establish connection to Databricks.
[[[hive]]]
name=Databricks
interface=sqlalchemy
options='{"url":"databricks+pyhive://token:<personal_token>@<host>:<port>/default","connect_args":{"cluster":"<cluster_name>"}}'
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:5 (5 by maintainers)
https://issues.cloudera.org/browse/HUE-9175
Using the latest master code, and looking at the problem again there seem to be a discrepancy between
py-hive
generated Thrift, using current thrift compiler (0.13.0) and the one hue maintains, generated with 0.9.3.Any reason why hive is still maintaining an older version of thrift generated py-hive code?
Thrift version 0.9.3 compiler
Thrift version 0.13.0 compiler