question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`TypeError: 'unicode' does not have the buffer interface`, using hue on Databricks clusters with SQLAlchemy interface and the Hive connector

See original GitHub issue

When connecting to Databricks clusters from Hue using SQLAlchemy interface and the Hive connector, we received TypeError: 'unicode' does not have the buffer interface.

After some days of debugging, we realised that the beeswax application which is installed and configured as part of hue and enabled you to perform queries on Apache Hive, had a custom autogenerated thrift python code that allows integration with HiveServer2. Because we were using the hive connector, anytime hue was about to establish connection to Databricks or run a statement, the custom thrift library tried to encode the SQL statements which failed with the TypeError as below:

[15/Feb/2020 17:11:22 -0800] sql_alchemy  ERROR    Query Error
Traceback (most recent call last):
  File "/usr/share/hue/desktop/libs/notebook/src/notebook/connectors/sql_alchemy.py", line 85, in decorator
    return func(*args, **kwargs)
  File "/usr/share/hue/desktop/libs/notebook/src/notebook/connectors/sql_alchemy.py", line 139, in execute
    connection = engine.connect()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2209, in connect
    return self._connection_cls(self, **kwargs)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 103, in __init__
    else engine.raw_connection()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2307, in raw_connection
    self.pool.unique_connection, _connection
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2276, in _wrap_pool_connect
    return fn()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 303, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 773, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
    rec = pool._do_get()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
    self._dec_overflow()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
    return self._create_connection()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
    self.__connect(first_connect_check=True)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 652, in __connect
    connection = pool._invoke_creator(self)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 489, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/databricks_dbapi/databricks.py", line 61, in connect
    return hive.connect(database=database, thrift_transport=transport)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 94, in connect
    return Connection(*args, **kwargs)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 205, in __init__
    cursor.execute('USE `{}`'.format(database))
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 364, in execute
    response = self._connection.client.ExecuteStatement(req)
  File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 298, in ExecuteStatement
    self.send_ExecuteStatement(req)
  File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 305, in send_ExecuteStatement
    args.write(self._oprot)
  File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 1882, in write
    self.req.write(oprot)
  File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py", line 4460, in write
    oprot.writeBinary(self.statement)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 131, in writeBinary
    self.trans.write(str)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/thrift/transport/THttpClient.py", line 142, in write
    self.__wbuf.write(buf)
TypeError: 'unicode' does not have the buffer interface
[15/Feb/2020 17:11:22 -0800] decorators   ERROR    Error running execute

We understood clearly that line 4460 in /usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py did not handle properly the encoding of unicode string, when writing from string to binary; using python version 2.7.

Line 4460 in /usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py

oprot.writeString(self.statement)

Instead of:

oprot.writeString(self.statement.encode('utf-8') if sys.version_info[0] == 2 else self.statement)

To go around this problem, we upgraded the pip version which comes with Hue and then installed the databricks-dbapi[sqlalchemy] package which then installs other collected packages with a compatible and updated thrift library able to handle unicode encoding:

RUN ./build/env/bin/pip install --upgrade pip
RUN ./build/env/bin/pip install databricks-dbapi[sqlalchemy]

We go ahead to remove the native hue thrift library so that connection will fall over our newly installed thrift library.

RUN rm -rf /usr/share/hue/apps/beeswax/gen-py

The complete Dockerfile looks like this:

FROM gethue/hue:<latest-stable-hue-version>

USER root

RUN ./build/env/bin/pip install --upgrade pip
RUN ./build/env/bin/pip install databricks-dbapi[sqlalchemy]

ADD hue.ini /usr/share/hue/desktop/conf/z-hue.ini

# Remove custom hue thrift library 
RUN rm -rf /usr/share/hue/apps/beeswax/gen-py

EXPOSE 8888

How our hue.ini config looked like

So we used the hive interpreter in the Hue config which PyHive extends. This is what the databricks+pyhive dialect/driver which come by installing databricks-dbapi uses with SQLAlchemy to establish connection to Databricks.

[[[hive]]]
    name=Databricks
    interface=sqlalchemy
    options='{"url":"databricks+pyhive://token:<personal_token>@<host>:<port>/default","connect_args":{"cluster":"<cluster_name>"}}'

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ebessahcommented, Feb 21, 2020

Using the latest master code, and looking at the problem again there seem to be a discrepancy between py-hive generated Thrift, using current thrift compiler (0.13.0) and the one hue maintains, generated with 0.9.3.

Any reason why hive is still maintaining an older version of thrift generated py-hive code?

Thrift version 0.9.3 compiler

if self.statement is not None:
      oprot.writeFieldBegin('statement', TType.STRING, 2)
      if sys.version_info[0] > 2:
        oprot.writeBinary(self.statement)
      else:
        oprot.writeString(self.statement) - Line 4457
      oprot.writeFieldEnd()

Thrift version 0.13.0 compiler

if self.statement is not None:
            oprot.writeFieldBegin('statement', TType.STRING, 2)
            oprot.writeString(self.statement.encode('utf-8') if sys.version_info[0] == 2 else self.statement) - Line 3967
            oprot.writeFieldEnd()
Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError: 'unicode' does not have the buffer interface [closed]
You've got a unicode string. You're trying to call a function that requires str -like types ( str , bytearray , anything else...
Read more >
Release Notes — Airflow Documentation
New to this release of Airflow is the concept of Datasets to Airflow, and with it a new way of scheduling dags: data-aware...
Read more >
Trino: The Definitive Guide - Starburst
in “Trino Command-Line Interface” on page 27. You can also connect with any other client application, such as DBeaver, to Trino on the...
Read more >
API Reference - Apache Airflow Documentation - Read the Docs
Operators allow for generation of certain types of tasks that become nodes in the DAG when instantiated. All operators derive from BaseOperator and...
Read more >
How to properly load Unicode (UTF-8) characters from table ...
I'm using the latest Simba Spark JDBC driver available from the Databricks website. The issue is that when the data comes over all...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found