Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get metadata from the Hive metastore

See original GitHub issue

Many big data deployments keep metadata about their data artifacts in the hive metastore, even if they don’t use hive itself. We’ve frequently received requests to support this. I believe that this would involve building a small dask-hive Python project that communicated with the Hive metastore through thrift.

cc @mariusvniekerk @martindurant @seibert

Issue Analytics

State:
Created 6 years ago
Comments:14 (12 by maintainers)

Top GitHub Comments

1reaction

nils-brauncommented, Feb 4, 2021

I would also love to see that and I especially love your solution, @mariusvniekerk! Is there a particular reason you were writing your “own” (Thrift) connection and not using something like pyhive or sqlalchemy? Because in principle, the information you are requesting via thrift can also be queried with a DESCRIBE FORMATTED <table> call and then all the version compatibility and authentication is already fixed by another package. That might simplify testing. (Side note: the output format is then a bit confusing, but can be parsed).

I know that blazingSQL needed to implement something similar (but only “similar”, because they have their own input functions) and I have also implemented something like this in dask-sql (honestly, because I was not aware of your repo!) using pyhive/sqlalchemy. Do you think it would make sense to combine these efforts (I have seen your last commit was in 2017)? With thrift or pyhive or both? I would be happy to help here.

1reaction

mariusvniekerkcommented, Mar 2, 2018

I’ll give it a look thanks