question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting a error running spark-submit job

See original GitHub issue

Hello,

I am trying to follow the instructions here: https://github.com/facebookresearch/Horizon/blob/master/docs/usage.md

When I run this script: /usr/local/spark/bin/spark-submit
–class com.facebook.spark.rl.Preprocessor preprocessing/target/rl-preprocessing-1.1.jar
cat ml/rl/workflow/sample_configs/discrete_action/timeline.json

I am getting2019-02-27 00:57:03 INFO HiveMetaStore:746 - 0: get_database: global_temp 2019-02-27 00:57:03 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_database: global_temp 2019-02-27 00:57:03 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException Exception in thread “main” org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and ‘source_table.mdp_id’ is not an aggregate function. Wrap ‘()’ in windowing function(s) or wrap ‘source_table.mdp_id’ in first() (or first_value) if you don’t care which value you get.;; 'Sort ['HASH('mdp_id, 'sequence_number) ASC NULLS FIRST], false ± 'RepartitionByExpression ['HASH('mdp_id, 'sequence_number)], 200 ± 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, next_state_features#24, next_action#25, sequence_number#2, sequence_number_ordinal#26, time_diff#27, possible_actions#7, possible_next_actions#28, metrics#8] ± 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8, next_state_features#24, next_action#25, sequence_number_ordinal#26, _we3#30, possible_next_actions#28, next_state_features#24, next_action#25, sequence_number_ordinal#26, (coalesce(_we3#30, sequence_number#2) - sequence_number#2) AS time_diff#27, possible_next_actions#28] ± 'Window [lead(state_features#4, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_state_features#24, lead(action#5, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_action#25, row_number() windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS sequence_number_ordinal#26, lead(sequence_number#2, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS _we3#30, lead(possible_actions#7, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS possible_next_actions#28], [mdp_id#1], [mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST] ± 'Filter isnotnull('next_state_features) ± Aggregate [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8] ± SubqueryAlias source_table ± Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8] ± Filter ((ds#0 >= 2019-01-01) && (ds#0 <= 2019-01-01)) ± SubqueryAlias cartpole_discrete ± Relation[ds#0,mdp_id#1,sequence_number#2,action_probability#3,state_features#4,action#5,reward#6,possible_actions#7,metrics#8] json

I tried the steps, after manually installing Hbase (This step is missing in the documentation. Please let me know, if you want me to add it)

I am using docker on Mac instructions (https://github.com/facebookresearch/Horizon/blob/master/docs/installation.md) to get going. Can anyone please help me on how to move forward?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
sureshakellacommented, Mar 1, 2019

Awesome. Thank you. Job ran successfully after applying #104

1reaction
MisterTeacommented, Mar 1, 2019

With the changes in https://github.com/facebookresearch/Horizon/pull/104 and using spark 2.3.3 I was able to go through the whole usage doc.

Read more comments on GitHub >

github_iconTop Results From Across the Web

showing error while submitting the spark job - Stack Overflow
getting following error message. C:\Users\Lenovo\IdeaProjects\untitled\target\scala-2.11>spark-submit --class retail_db.
Read more >
Troubleshooting Spark Issues
Navigate to the Analyze page; Click on the Resources tab to analyze the errors and perform the appropriate action. Run the job again...
Read more >
Could not find or load main class org.apache.spark.executor ...
When User submit a Spark Job using spark-submit, it fails with the error - Could not find or load main class org.apache.spark.executor.
Read more >
Create, run, and manage Databricks Jobs
Databricks manages the task orchestration, cluster management, monitoring, and error reporting for all of your jobs. You can run your jobs ...
Read more >
Spark Submit Command Explained with Examples
The spark-submit command is a utility to run or submit a Spark or PySpark application program (or job) to the cluster by specifying...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found