question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Setting up local debug environment with Hive metastore

See original GitHub issue

I was trying to setup local debug environment with Presto source code and I was following instructions in README.md. I have setup IntelliJ and ran PrestoServer as advised. README advises that when I don’t have Hive metastore, -Dhive.metastore.uri=thrift://localhost:9083 as VM option should work. However, with that setup, metastore connection does not work.

$ $PRESTO_HOME/presto-cli/target/presto-cli-*-executable.jar --server localhost:8080 --catalog hive
presto> show schemas;
Query 20190303_205534_00000_wrjah failed: Failed connecting to Hive metastore: [localhost:9083]

In order to workaround this problem, I setup local Derby-based metastore in Hive and tried hooking that up. That “partially” worked.

$ ${HIVE_HOME}/bin/schematool -initSchema -dbType derby
$ $HIVE_HOME/bin/hive
hive> #Created table namesdata and inserted data through Hive
$ $HIVE_HOME/bin/hive --service metastore -p 9080 &
2019-03-03 22:01:15: Starting Hive Metastore Server
$ # Added -Dhive.metastore.uri=thrift://localhost:9080 in IntelliJ
$ $PRESTO_HOME/presto-cli/target/presto-cli-*-executable.jar --server localhost:8080 --catalog hive
presto> show schemas;
       Schema
--------------------
 default
 information_schema
(2 rows)

Query 20190303_210236_00000_8u9zb, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [2 rows, 35B] [1 rows/s, 30B/s]

presto> show tables in default; # This is weird problem
Query 20190303_210253_00002_8u9zb failed: localhost:9080: java.net.SocketTimeoutException: Read timed out

presto> select * from default.namesdata limit 5;
 state | gender | year |  name   | number
-------+--------+------+---------+--------
 AK    | F      | 1910 | Dorothy |      5
 AK    | F      | 1910 | Annie   |     12
 AL    | F      | 1910 | Louise  |    138
 AL    | F      | 1910 | Alice   |    112
 AL    | F      | 1910 | Ida     |     95
(5 rows)

Query 20190303_210329_00003_8u9zb, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:01 [16K rows, 303KB] [14.7K rows/s, 279KB/s] 

Can you please tell why show tables failed but the select query did not? Also, is there more elegant way to setup development environment locally?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
ssaumitracommented, Mar 30, 2019

@wenleix Thanks for the suggestion. I just tried HiveQueryRunner. With that, show tables works, but select queries do not.

presto-cli/target/presto-cli-*-executable.jar --server localhost:8080 --catalog hive
presto> show schemas;
       Schema
--------------------
 information_schema
 tpch
 tpch_bucketed
(3 rows)

Query 20190330_012406_00000_hramy, FINISHED, 4 nodes
Splits: 22 total, 22 done (100.00%)
0:02 [3 rows, 50B] [1 rows/s, 32B/s]

presto> show tables in tpch;
  Table
----------
 customer
 lineitem
 nation
 orders
 part
 partsupp
 region
 supplier
(8 rows)

Query 20190330_012426_00001_hramy, FINISHED, 4 nodes
Splits: 22 total, 22 done (100.00%)
0:01 [8 rows, 166B] [11 rows/s, 237B/s]

presto> select * from tpch.customer limit 10; # Weird error
Query 20190330_012444_00002_hramy failed: Access Denied: Cannot select from table tpch.customer

presto> select count(*) from tpch.nation limit 10;
Query 20190330_012510_00004_hramy failed: Access Denied: Cannot select from table tpch.nation

presto> select * from tpch_bucketed.nation limit 10;
Query 20190330_012638_00006_hramy failed: Access Denied: Cannot select from table tpch_bucketed.nation

Although, good news is that, I could create another table and select worked for that table

presto> use default;
USE
presto:default> create table namesdata (state varchar, gender varchar, year int, name varchar, number int) with (format = 'TEXTFILE', external_location = 'file:///Users/saumitra/Documents/opensource/data/usa_names');
CREATE TABLE
presto:default> select * from namesdata limit 5;
             state             | gender | year | name | number
-------------------------------+--------+------+------+--------
 state,gender,year,name,number | NULL   | NULL | NULL | NULL
 AK,F,1910,Dorothy,5           | NULL   | NULL | NULL | NULL
 AK,F,1910,Annie,12            | NULL   | NULL | NULL | NULL
 AL,F,1910,Louise,138          | NULL   | NULL | NULL | NULL
 AL,F,1910,Alice,112           | NULL   | NULL | NULL | NULL
(5 rows)

Query 20190330_015007_00028_hramy, FINISHED, 1 node
Splits: 6 total, 6 done (100.00%)
0:00 [16K rows, 303KB] [50.1K rows/s, 950KB/s]
0reactions
wenleixcommented, Jul 23, 2020

@ssaumitra : Looks like this issue is resolved. Feel free to reopen it if you have any further question 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Configuring the Hive Metastore for CDH | 6.3.x
Open the Cloudera Manager Admin Console and go to the Hive-1 service. Click the Configuration tab. Select Category > Hive Metastore Database.
Read more >
GettingStarted - Apache Software Foundation
The location of the Hive configuration directory can be changed by setting the HIVE_CONF_DIR environment variable. Configuration variables can be changed by (re ......
Read more >
The Hive MetaStore and Local Development - Pivotal BI
In this next post in our series focussing on Databricks development, we'll look at how to create our own Hive metastore locally using...
Read more >
how to run hive in debug mode - Stack Overflow
You better start hive shell by switching logger mode to DEBUG as follows, I hope you could find something useful from there.
Read more >
External Apache Hive metastore | Databricks on AWS
External Apache Hive metastore · Hive metastore deployment modes · Network setup · Cluster configurations · Set up an external metastore using the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found