Get metadata from the Hive metastore
See original GitHub issueMany big data deployments keep metadata about their data artifacts in the hive metastore, even if they don’t use hive itself. We’ve frequently received requests to support this. I believe that this would involve building a small dask-hive
Python project that communicated with the Hive metastore through thrift.
Issue Analytics
- State:
- Created 6 years ago
- Comments:14 (12 by maintainers)
Top Results From Across the Web
How to get metadata of hive tables, columns, views, constraint ...
if you have Hue available you can go to Metastore Tables from the top menu Data Browsers. There you can find metadata for...
Read more >Viewing Hive Schema and Table Metadata - Vertica
When using Hive, you access metadata about schemas and tables by executing statements written in HiveQL (Hive's version of SQL) such as SHOW...
Read more >Hive Metadata - StreamSets Documentation
Use the Hive Metadata processor for records to be written to HDFS or MapR FS when you want the Hive Metastore destination to...
Read more >7.2 Hive Metadata Provider
The Hive Metadata Provider is used to retrieve the table metadata from a Hive metastore. The metadata will be retrieved from Hive for...
Read more >Useful queries for the Hive metastore - Analytics Anvil
The Hive metastore stores metadata about objects within Hive. Usually this metastore sits within a relational database such as MySQL.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I would also love to see that and I especially love your solution, @mariusvniekerk! Is there a particular reason you were writing your “own” (Thrift) connection and not using something like
pyhive
orsqlalchemy
? Because in principle, the information you are requesting via thrift can also be queried with aDESCRIBE FORMATTED <table>
call and then all the version compatibility and authentication is already fixed by another package. That might simplify testing. (Side note: the output format is then a bit confusing, but can be parsed).I know that blazingSQL needed to implement something similar (but only “similar”, because they have their own input functions) and I have also implemented something like this in dask-sql (honestly, because I was not aware of your repo!) using pyhive/sqlalchemy. Do you think it would make sense to combine these efforts (I have seen your last commit was in 2017)? With thrift or pyhive or both? I would be happy to help here.
I’ll give it a look thanks