Error using spark adapter in thrift mode
See original GitHub issueMoving https://github.com/fishtown-analytics/dbt-spark/pull/20#issuecomment-497518244 here:
I tried to use this spark with this pull request but I get the following error:
2019-05-30 18:17:30,493 (MainThread): Encountered an error:
2019-05-30 18:17:30,493 (MainThread): not enough values to unpack (expected 3, got 1)
2019-05-30 18:17:30,532 (MainThread): Traceback (most recent call last):
File "/home/paul/.local/lib/python3.6/site-packages/dbt/main.py", line 79, in main
results, succeeded = handle_and_check(args)
File "/home/paul/.local/lib/python3.6/site-packages/dbt/main.py", line 153, in handle_and_check
task, res = run_from_args(parsed)
File "/home/paul/.local/lib/python3.6/site-packages/dbt/main.py", line 209, in run_from_args
results = run_from_task(task, cfg, parsed)
File "/home/paul/.local/lib/python3.6/site-packages/dbt/main.py", line 217, in run_from_task
result = task.run()
File "/home/paul/.local/lib/python3.6/site-packages/dbt/task/runnable.py", line 256, in run
self.before_run(adapter, selected_uids)
File "/home/paul/.local/lib/python3.6/site-packages/dbt/task/run.py", line 85, in before_run
self.populate_adapter_cache(adapter)
File "/home/paul/.local/lib/python3.6/site-packages/dbt/task/run.py", line 23, in populate_adapter_cache
adapter.set_relations_cache(self.manifest)
File "/home/paul/.local/lib/python3.6/site-packages/dbt/adapters/base/impl.py", line 331, in set_relations_cache
self._relations_cache_for_schemas(manifest)
File "/home/paul/.local/lib/python3.6/site-packages/dbt/adapters/base/impl.py", line 313, in _relations_cache_for_schemas
for relation in self.list_relations_without_caching(db, schema):
File "/home/paul/dbt-spark/dbt/adapters/spark/impl.py", line 75, in list_relations_without_caching
for _database, name, _ in results:
ValueError: not enough values to unpack (expected 3, got 1)
If i add a print(results[0]) right above that line, it seems like results has a single entry instead of 3:
<agate.Row: ('mytable')>
I couldn’t get spark connecting in http mode (i.e. without this pull request) so I’m not sure if the issue is with this pull request or something more general.
This is connecting to an EMR 5.20.0 cluster, and thrift was started with sudo /usr/lib/spark/sbin/start-thriftserver.sh --master yarn-client.
Issue Analytics
- State:
- Created 4 years ago
- Comments:17 (8 by maintainers)
Top Results From Across the Web
Error using spark adapter in thrift mode · Issue #22 · dbt-labs ...
This is connecting to an EMR 5.20. 0 cluster, and thrift was started with sudo /usr/lib/spark/sbin/start-thriftserver.sh --master yarn-client .
Read more >Cannot set database in spark! [DBT + Spark + Thrift]
I need to transform some models in DBT using Apache Spark as the adapter. Now, I'm running spark locally on my local machine....
Read more >Thrift JDBC/ODBC Server — Spark Thrift Server (STS)
When executed in local mode, Spark Thrift Server and spark-shell will try to access the same Hive Warehouse's directory that will inevitably lead...
Read more >Solved: spark-sql : Error in session initiation - NoClass...
I am successfully able to access hive tables through hive2. I am using HDP3.0 and for Hive execution engine is Tez (Map-Reduce has...
Read more >Apache Spark Profile - dbt Developer Hub
Use the thrift connection method if you are connecting to a Thrift server sitting in front of a Spark cluster, e.g. a cluster...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

AWS support helped me figure out the issue:
On my EMR cluster, port 10000 is for Hive and 10001 is for Spark. When I changed to 10001 it worked (after running
start-thriftserver.sh).@rhousewright Should we maybe mention this port difference in the docs as part of your PR? https://github.com/fishtown-analytics/dbt-spark/pull/20/files#diff-04c6e90faac2675aa89e2176d2eec7d8R22
Here’s my profile now:
So this is super interesting. I tried running a similar thing against an EMR cluster running EMR 5.21.0, with the following generated SQL for my model, and it worked just fine for me. So that’s weird?
There’s nothing in the 5.21.0 release notes that would indicate any relevant changes (vs 5.20.0), and I’m not doing anything unusual / relevant in terms of cluster config (I am using Glue catalog, in case that matters). The only thing I did differently, I think, than you did is to start the thrift server with
sudo /usr/lib/spark/sbin/start-thriftserver.sh(without the--master yarn-client).I will note that I only have Spark installed on the cluster (I don’t have Hive installed) - do you have both installed? If so, is it possible that installing Hive in some way overtakes the HiveServer2 connection to the Spark backend? I haven’t had the chance to test that theory yet, though. Config I’m using right now is:
In general, I’m hoping to get some dedicated time to work on dbt-spark stuff in the next little bit, trying to set aside some time in an upcoming sprint to see if we can get a POC working in our space. Hopefully will learn a lot, and possibly generate some pull requests, through that process!