question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get metadata from the Hive metastore

See original GitHub issue

Many big data deployments keep metadata about their data artifacts in the hive metastore, even if they don’t use hive itself. We’ve frequently received requests to support this. I believe that this would involve building a small dask-hive Python project that communicated with the Hive metastore through thrift.

cc @mariusvniekerk @martindurant @seibert

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:14 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
nils-brauncommented, Feb 4, 2021

I would also love to see that and I especially love your solution, @mariusvniekerk! Is there a particular reason you were writing your “own” (Thrift) connection and not using something like pyhive or sqlalchemy? Because in principle, the information you are requesting via thrift can also be queried with a DESCRIBE FORMATTED <table> call and then all the version compatibility and authentication is already fixed by another package. That might simplify testing. (Side note: the output format is then a bit confusing, but can be parsed).

I know that blazingSQL needed to implement something similar (but only “similar”, because they have their own input functions) and I have also implemented something like this in dask-sql (honestly, because I was not aware of your repo!) using pyhive/sqlalchemy. Do you think it would make sense to combine these efforts (I have seen your last commit was in 2017)? With thrift or pyhive or both? I would be happy to help here.

1reaction
mariusvniekerkcommented, Mar 2, 2018

I’ll give it a look thanks

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to get metadata of hive tables, columns, views, constraint ...
if you have Hue available you can go to Metastore Tables from the top menu Data Browsers. There you can find metadata for...
Read more >
Viewing Hive Schema and Table Metadata - Vertica
When using Hive, you access metadata about schemas and tables by executing statements written in HiveQL (Hive's version of SQL) such as SHOW...
Read more >
Hive Metadata - StreamSets Documentation
Use the Hive Metadata processor for records to be written to HDFS or MapR FS when you want the Hive Metastore destination to...
Read more >
7.2 Hive Metadata Provider
The Hive Metadata Provider is used to retrieve the table metadata from a Hive metastore. The metadata will be retrieved from Hive for...
Read more >
Useful queries for the Hive metastore - Analytics Anvil
The Hive metastore stores metadata about objects within Hive. Usually this metastore sits within a relational database such as MySQL.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found