question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Train job stuck executing

See original GitHub issue

I’ve followed the quick start guide: https://actionml.com/docs/h_ur_quickstart

My config.json is as follows:

{
    "engineId": "2",
    "engineFactory": "com.actionml.engines.ur.UREngine",
    "sparkConf": {
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
        "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
        "spark.kryo.referenceTracking": "false",
        "spark.kryoserializer.buffer": "300m",
        "spark.executor.memory": "3g",
        "spark.driver.memory": "3g",
        "spark.es.index.auto.create": "true",
        "spark.es.nodes": "harness-docker-compose_elasticsearch_1",
        "spark.es.nodes.wan.only": "true"
    },
    "algorithm":{
        "indicators": [
            {
                "name": "buy"
            },{
                "name": "view"
            }
        ]
    }
}

I run harness-cli train 2 and when I check harness-cli status engines 2 I see:

/harness-cli/harness-cli/harness-status: line 10: /harness-cli/harness-cli/RELEASE: No such file or directory
Harness CLI v settings
==================================================================
HARNESS_CLI_HOME ........................ /harness-cli/harness-cli
HARNESS_CLI_SSL_ENABLED .................................... false
HARNESS_CLI_AUTH_ENABLED ................................... false
HARNESS_SERVER_ADDRESS ................................... harness
HARNESS_SERVER_PORT ......................................... 9090
==================================================================
Harness Server status: OK
Status for engine-id: 2
{
    "engineParams": {
        "algorithm": {
            "indicators": [
                {
                    "name": "buy"
                },
                {
                    "name": "view"
                }
            ]
        },
        "engineFactory": "com.actionml.engines.ur.UREngine",
        "engineId": "2",
        "sparkConf": {
            "spark.driver.memory": "3g",
            "spark.es.index.auto.create": "true",
            "spark.es.nodes": "harness-docker-compose_elasticsearch_1",
            "spark.es.nodes.wan.only": "true",
            "spark.executor.memory": "3g",
            "spark.kryo.referenceTracking": "false",
            "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
            "spark.kryoserializer.buffer": "300m",
            "spark.serializer": "org.apache.spark.serializer.KryoSerializer"
        }
    },
    "jobStatuses": {
        "ed0becbf-ce7d-4e62-822d-2e3f2138e235": {
            "comment": "Spark job",
            "jobId": "ed0becbf-ce7d-4e62-822d-2e3f2138e235",
            "status": {
                "name": "executing"
            }
        }
    }
}

The job never moves from executing, is there anyway I can debug why this is happening?

I’ve set harness up by following https://actionml.com/docs/harness_container_guide

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:7

github_iconTop GitHub Comments

5reactions
dataedgehungarycommented, Dec 17, 2019

The

org.apache.spark.SparkException: A master URL must be set in your configuration

ERROR is solved by adding “master”: “local” according to the documentation:

https://actionml.com/docs/h_ur_config#spark-parameters-codesparkconfcode

{
   "engineId": "ecom_ur",
   "engineFactory": "com.actionml.engines.ur.UREngine",
   "sparkConf": {
       "master": "local",
       "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
       "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
       "spark.kryo.referenceTracking": "false",
       "spark.kryoserializer.buffer": "300m",
       "spark.executor.memory": "20g",
       "spark.driver.memory": "10g",
       "spark.es.index.auto.create": "true",
       "spark.es.nodes": "localhost",
       "spark.es.nodes.wan.only": "true"
   },
   "algorithm":{
       "indicators": [ 
           {
               "name": "buy"
           }
       ]
   }
}
0reactions
mick912commented, Dec 18, 2019

Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

SQL Jobs hanging. Job stays in executing mode but never ...
Ever since all SQL jobs just hang in executing mode. ... Erland, hanging; I meant jobs get stuck in executing state for hours....
Read more >
Batch Job stuck in executing status but dooesn't execute...
Hi all, I have created a batch job executing class wich is implemented to insert a record in table every two minutes (just...
Read more >
How to fix a Control-M job stuck in an "Executing" status (z/OS)
This video demonstrates how to fix a Control-M job that has become stuck in an executing status on Control-M for z/OS.
Read more >
Pipeline gets stuck in a job when a self-hosted runner ... - GitLab
During the train sleep destroy your self-hosted runner. What is the current bug behavior? Job stays running forever.
Read more >
Cognos jobs stuck at Executing state for a long time. - IBM
Several random jobs get stuck in Executing state and there is no pattern observed as some jobs process successfully and some not.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found