question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error using spark adapter in thrift mode

See original GitHub issue

Moving https://github.com/fishtown-analytics/dbt-spark/pull/20#issuecomment-497518244 here:

I tried to use this spark with this pull request but I get the following error:

2019-05-30 18:17:30,493 (MainThread): Encountered an error:
2019-05-30 18:17:30,493 (MainThread): not enough values to unpack (expected 3, got 1)
2019-05-30 18:17:30,532 (MainThread): Traceback (most recent call last):
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/main.py", line 79, in main
    results, succeeded = handle_and_check(args)
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/main.py", line 153, in handle_and_check
    task, res = run_from_args(parsed)
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/main.py", line 209, in run_from_args
    results = run_from_task(task, cfg, parsed)
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/main.py", line 217, in run_from_task
    result = task.run()
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/task/runnable.py", line 256, in run
    self.before_run(adapter, selected_uids)
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/task/run.py", line 85, in before_run
    self.populate_adapter_cache(adapter)
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/task/run.py", line 23, in populate_adapter_cache
    adapter.set_relations_cache(self.manifest)
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/adapters/base/impl.py", line 331, in set_relations_cache
    self._relations_cache_for_schemas(manifest)
  File "/home/paul/.local/lib/python3.6/site-packages/dbt/adapters/base/impl.py", line 313, in _relations_cache_for_schemas
    for relation in self.list_relations_without_caching(db, schema):
  File "/home/paul/dbt-spark/dbt/adapters/spark/impl.py", line 75, in list_relations_without_caching
    for _database, name, _ in results:
ValueError: not enough values to unpack (expected 3, got 1)

If i add a print(results[0]) right above that line, it seems like results has a single entry instead of 3: <agate.Row: ('mytable')>

I couldn’t get spark connecting in http mode (i.e. without this pull request) so I’m not sure if the issue is with this pull request or something more general.

This is connecting to an EMR 5.20.0 cluster, and thrift was started with sudo /usr/lib/spark/sbin/start-thriftserver.sh --master yarn-client.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
pgr0sscommented, Jun 18, 2019

AWS support helped me figure out the issue:

On my EMR cluster, port 10000 is for Hive and 10001 is for Spark. When I changed to 10001 it worked (after running start-thriftserver.sh).

@rhousewright Should we maybe mention this port difference in the docs as part of your PR? https://github.com/fishtown-analytics/dbt-spark/pull/20/files#diff-04c6e90faac2675aa89e2176d2eec7d8R22

Here’s my profile now:

default:
  target: dev
  outputs:
    dev:
      type: spark
      method: thrift
      schema: experiments
      host: 127.0.01
      port: 10001
      threads: 4
1reaction
rhousewrightcommented, Jun 13, 2019

So this is super interesting. I tried running a similar thing against an EMR cluster running EMR 5.21.0, with the following generated SQL for my model, and it worked just fine for me. So that’s weird?

create table dbt_test_db.my_first_dbt_model
    using parquet
    partitioned by (id)
    as
select 1 as id, 2 as not_id

There’s nothing in the 5.21.0 release notes that would indicate any relevant changes (vs 5.20.0), and I’m not doing anything unusual / relevant in terms of cluster config (I am using Glue catalog, in case that matters). The only thing I did differently, I think, than you did is to start the thrift server with sudo /usr/lib/spark/sbin/start-thriftserver.sh (without the --master yarn-client).

I will note that I only have Spark installed on the cluster (I don’t have Hive installed) - do you have both installed? If so, is it possible that installing Hive in some way overtakes the HiveServer2 connection to the Spark backend? I haven’t had the chance to test that theory yet, though. Config I’m using right now is: Screen Shot 2019-06-13 at 4 50 54 PM

In general, I’m hoping to get some dedicated time to work on dbt-spark stuff in the next little bit, trying to set aside some time in an upcoming sprint to see if we can get a POC working in our space. Hopefully will learn a lot, and possibly generate some pull requests, through that process!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error using spark adapter in thrift mode · Issue #22 · dbt-labs ...
This is connecting to an EMR 5.20. 0 cluster, and thrift was started with sudo /usr/lib/spark/sbin/start-thriftserver.sh --master yarn-client .
Read more >
Cannot set database in spark! [DBT + Spark + Thrift]
I need to transform some models in DBT using Apache Spark as the adapter. Now, I'm running spark locally on my local machine....
Read more >
Thrift JDBC/ODBC Server — Spark Thrift Server (STS)
When executed in local mode, Spark Thrift Server and spark-shell will try to access the same Hive Warehouse's directory that will inevitably lead...
Read more >
Solved: spark-sql : Error in session initiation - NoClass...
I am successfully able to access hive tables through hive2. I am using HDP3.0 and for Hive execution engine is Tez (Map-Reduce has...
Read more >
Apache Spark Profile - dbt Developer Hub
Use the thrift connection method if you are connecting to a Thrift server sitting in front of a Spark cluster, e.g. a cluster...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found