question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: [HiveServer2Error] when connecting impala.connect

See original GitHub issue
webhdfs_host = 'localhost'
webhdfs_port = '50070'

hdfs = ibis.hdfs_connect(host=webhdfs_host, port=webhdfs_port
                         , auth_mechanism='PLAIN', user='hive')


hdfs.ls('.') . ## returns ['warehouse']

I connected hdfs sucessfully.

After that, I tried connecting hiveserver2 like below

impala_host = 'localhost'
impala_port = 10000
client = ibis.impala.connect(host='localhost', port=10000,
                              database='default',user='406449', password='right-password',
                               auth_mechanism='PLAIN'
                             , hdfs_client=hdfs
                            )

It returns error like below

----------------------------------------------------------------------
HiveServer2Error                     Traceback (most recent call last)
<ipython-input-54-f6492f045c37> in <module>()
      5                               database='default',user='406449', password='Algorithmauthy1!',
      6                                auth_mechanism='PLAIN'
----> 7                              , hdfs_client=hdfs
      8                             )

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/ibis/impala/api.py in connect(host, port, database, timeout, use_ssl, ca_cert, user, password, auth_mechanism, kerberos_service_name, pool_size, hdfs_client)
     96     con = ImpalaConnection(pool_size=pool_size, **params)
     97     try:
---> 98         client = ImpalaClient(con, hdfs_client=hdfs_client)
     99 
    100         if options.default_backend is None:

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/ibis/impala/client.py in __init__(self, con, hdfs_client, **params)
    475         self._temp_objects = weakref.WeakValueDictionary()
    476 
--> 477         self._ensure_temp_db_exists()
    478 
    479     def _build_ast(self, expr):

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/ibis/impala/client.py in _ensure_temp_db_exists(self)
   1027                       ' may be disabled')
   1028             else:
-> 1029                 self.create_database(name, path=path, force=True)
   1030 
   1031     def _wrap_new_table(self, name, database, persist):

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/ibis/impala/client.py in create_database(self, name, path, force)
    616             self.hdfs.mkdir(path)
    617         statement = ddl.CreateDatabase(name, path=path, can_exist=force)
--> 618         return self._execute(statement)
    619 
    620     def drop_database(self, name, force=False):

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/ibis/client.py in _execute(self, query, results)
    152 
    153     def _execute(self, query, results=False):
--> 154         cur = self.con.execute(query)
    155         if results:
    156             return cur

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/ibis/impala/client.py in execute(self, query, async)
    117 
    118         try:
--> 119             cursor.execute(query, async=async)
    120         except:
    121             exc = traceback.format_exc()

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/ibis/impala/client.py in execute(self, stmt, async)
    227 
    228     def execute(self, stmt, async=False):
--> 229         self._cursor.execute_async(stmt)
    230         if async:
    231             return

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/impala/hiveserver2.py in execute_async(self, operation, parameters, configuration)
    341             self._last_operation = op
    342 
--> 343         self._execute_async(op)
    344 
    345     def _debug_log_state(self):

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/impala/hiveserver2.py in _execute_async(self, operation_fn)
    360         self._reset_state()
    361         self._debug_log_state()
--> 362         operation_fn()
    363         self._last_operation_active = True
    364         self._debug_log_state()

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/impala/hiveserver2.py in op()
    338             op = self.session.execute(self._last_operation_string,
    339                                       configuration,
--> 340                                       async=True)
    341             self._last_operation = op
    342 

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/impala/hiveserver2.py in execute(self, statement, configuration, async)
   1025                                    confOverlay=configuration,
   1026                                    runAsync=async)
-> 1027         return self._operation('ExecuteStatement', req)
   1028 
   1029     def get_databases(self, schema='.*'):

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/impala/hiveserver2.py in _operation(self, kind, request)
    955 
    956     def _operation(self, kind, request):
--> 957         resp = self._rpc(kind, request)
    958         return self._get_operation(resp.operationHandle)
    959 

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/impala/hiveserver2.py in _rpc(self, func_name, request)
    923         response = self._execute(func_name, request)
    924         self._log_response(func_name, response)
--> 925         err_if_rpc_not_ok(response)
    926         return response
    927 

/Users/hyundai/.pyenv/versions/3.6.0/envs/py3/lib/python3.6/site-packages/impala/hiveserver2.py in err_if_rpc_not_ok(resp)
    702             resp.status.statusCode != TStatusCode.SUCCESS_WITH_INFO_STATUS and
    703             resp.status.statusCode != TStatusCode.STILL_EXECUTING_STATUS):
--> 704         raise HiveServer2Error(resp.status.errorMessage)
    705 
    706 

HiveServer2Error: Error while compiling statement: FAILED: ParseException line 1:30 cannot recognize input near '__ibis_tmp' 'LOCATION' ''/tmp/ibis'' in create database statement

But when i tried without hdfs_client, no error occured.

client = ibis.impala.connect(host='localhost', port=10000, auth_mechanism='PLAIN')
Without an HDFS connection, certain functionality may be disabled

Without hdfs_client, I can get db, table informations but i cannot insert rows to Table.

table = client.table("u_data", database='default')
table  # works well

data = pd.DataFrame({'foo': [1, 2, 3, 4], 'bar': ['a', 'b', 'c', 'd']})
db.create_table('pandas_table', obj=data) . # << raise errors
IbisError: No HDFS connection; must pass connection using the hdfs_client argument to ibis.impala.connect

If possible, ‘detailed hdfs_connect, impala.connect document’ would be great for newbie like me.

Because hdfs.ls('.') returns result well, I think there is no error at hdfs connections. Is there any error while connecting hiveserver2? It looks like fine when connecting Impala.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
sushant3commented, Nov 29, 2017

check my query on stackflow. I ended up solving this by changing the default location for staging. This seems to be due to the security and structure of the individual hadoop system.

https://stackoverflow.com/questions/47445099/inserting-data-to-impala-table-using-ibis-python/47543691#47543691

Another thing…i’m newbie to hadoop but I was told ‘__’ in begining of table/dbase name is not allowed in cloudera. Could this be another cause.

1reaction
cpcloudcommented, Oct 23, 2017

@sanghkaang @jwaligora Can either or both of you show the versions of ibis and thrift_sasl that you have installed?

Read more comments on GitHub >

github_iconTop Results From Across the Web

A bug: impala.error.HiveServer2Error: Invalid query handle
In the GitHub https://github.com/cloudera/impyla/issues/278, it says there is not way we can set it up in the connection and would need to put ......
Read more >
How to resolve the HiveServer2Error Error that occurs while ...
Her is the Code for Connecting to Impala DB. ... from impala.util import as_pandas import os import pandas from impala.dbapi import connect.
Read more >
Developers - BUG: [HiveServer2Error] when connecting impala ...
BUG : [HiveServer2Error] when connecting impala.connect.
Read more >
impyla - PyPI
Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines. For higher-level Impala functionality, including a Pandas-like ...
Read more >
Impala Authorization
If a user has no privileges at all, that user cannot access any schema objects in the system. The error messages do not...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found