Help for debugging "HerokuStartupError: Failed to start for unknown reason": 5pts
See original GitHub issueCertain kinds of Dallinger errors take the form of HerokuStartupError: Failed to start for unknown reason
, and are difficult to debug because they provide no traceback. Example:
Traceback (most recent call last):
File "/home/frank/.virtualenvs/dlgr_env/bin/dallinger", line 33, in <module>
sys.exit(load_entry_point('dallinger', 'console_scripts', 'dallinger')())
File "/home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/frank/projects/Dallinger/dallinger/command_line.py", line 346, in wrapper
return f(**kwargs)
File "/home/frank/projects/Dallinger/dallinger/command_line.py", line 436, in debug
debugger.run()
File "/home/frank/projects/Dallinger/dallinger/deployment.py", line 598, in run
with HerokuLocalWrapper(
File "/home/frank/projects/Dallinger/dallinger/heroku/tools.py", line 529, in __enter__
self.start()
File "/home/frank/projects/Dallinger/dallinger/heroku/tools.py", line 421, in start
raise HerokuStartupError(
dallinger.heroku.tools.HerokuStartupError: Failed to start for unknown reason: [
"5:35:56 PM worker.1 | 2021-01-19 17:35:56,763 RQ GEVENT worker (Greenlet pool size=20) 'rq:worker:6222fd7846eb4efb8d77dcccaa8e55f9' started, version 1.0\n",
"5:35:57 PM web.1 | /home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/sqlalchemy/orm/mapper.py:1092: SAWarning: Reassigning polymorphic association for identity 'participant' from <Mapper at 0x7f1ecd737280; Participant> to <Mapper at 0x7f1e9b81fbe0; Participant>: Check for duplicate use of 'participant' as value for polymorphic_identity.\n", '5:35:57 PM web.1 | util.warn(\n',
"5:35:57 PM web.1 | /home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/sqlalchemy/orm/mapper.py:1092: SAWarning: Reassigning polymorphic association for identity 'custom_network' from <Mapper at 0x7f1e900b7b20; CustomNetwork> to <Mapper at 0x7f1e9006ce80; CustomNetwork>: Check for duplicate use of 'custom_network' as value for polymorphic_identity.\n",
'5:35:57 PM web.1 | util.warn(\n',
"5:35:57 PM web.1 | /home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/sqlalchemy/orm/mapper.py:1092: SAWarning: Reassigning polymorphic association for identity 'custom_trial' from <Mapper at 0x7f1e900b7a30; CustomTrial> to <Mapper at 0x7f1e90078190; CustomTrial>: Check for duplicate use of 'custom_trial' as value for polymorphic_identity.\n",
'5:35:57 PM web.1 | util.warn(\n',
"5:35:57 PM web.1 | /home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/sqlalchemy/orm/mapper.py:1092: SAWarning: Reassigning polymorphic association for identity 'custom_node' from <Mapper at 0x7f1e900b7ee0; CustomNode> to <Mapper at 0x7f1e900782e0; CustomNode>: Check for duplicate use of 'custom_node' as value for polymorphic_identity.\n",
'5:35:57 PM web.1 | util.warn(\n', "5:35:57 PM web.1 | /home/frank/.virtualenvs/dlgr_env/lib/python3.8/site-packages/sqlalchemy/orm/mapper.py:1092: SAWarning: Reassigning polymorphic association for identity 'custom_source' from <Mapper at 0x7f1e900d6160; CustomSource> to <Mapper at 0x7f1e900786a0; CustomSource>: Check for duplicate use of 'custom_source' as value for polymorphic_identity.\n",
'5:35:57 PM web.1 | util.warn(\n',
'5:35:57 PM web.1 | [2021-01-19 17:35:57 +0100] [793381] [DEBUG] Current configuration:\n',
'5:35:57 PM web.1 | config: None\n',
"5:35:57 PM web.1 | bind: ['0.0.0.0:5000']\n",
'5:35:57 PM web.1 | backlog: 2048\n',
'5:35:57 PM web.1 | workers: 13\n',
'5:35:57 PM web.1 | worker_class: geventwebsocket.gunicorn.workers.GeventWebSocketWorker\n',
'5:35:57 PM web.1 | threads: 1\n',
'5:35:57 PM web.1 | worker_connections: 1000\n',
'5:35:57 PM web.1 | max_requests: 0\n',
'5:35:57 PM web.1 | max_requests_jitter: 0\n',
'5:35:57 PM web.1 | timeout: 30\n',
'5:35:57 PM web.1 | graceful_timeout: 30\n',
'5:35:57 PM web.1 | keepalive: 2\n',
'5:35:57 PM web.1 | limit_request_line: 0\n',
'5:35:57 PM web.1 | limit_request_fields: 100\n',
'5:35:57 PM web.1 | limit_request_field_size: 8190\n',
'5:35:57 PM web.1 | reload: False\n',
'5:35:57 PM web.1 | reload_engine: auto\n',
'5:35:57 PM web.1 | reload_extra_files: []\n',
'5:35:57 PM web.1 | spew: False\n',
'5:35:57 PM web.1 | check_config: False\n',
'5:35:57 PM web.1 | preload_app: False\n',
'5:35:57 PM web.1 | sendfile: None\n',
'5:35:57 PM web.1 | reuse_port: False\n',
'5:35:57 PM web.1 | chdir: /tmp/tmpikvu_5lw/49b8d1eb-488a-375e-d8a3-3ae0abee20be\n',
'5:35:57 PM web.1 | daemon: False\n',
'5:35:57 PM web.1 | raw_env: []\n',
'5:35:57 PM web.1 | pidfile: None\n',
'5:35:57 PM web.1 | worker_tmp_dir: None\n',
'5:35:57 PM web.1 | user: 1000\n',
'5:35:57 PM web.1 | group: 1000\n',
'5:35:57 PM web.1 | umask: 0\n',
'5:35:57 PM web.1 | initgroups: False\n',
'5:35:57 PM web.1 | tmp_upload_dir: None\n',
"5:35:57 PM web.1 | secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}\n",
"5:35:57 PM web.1 | forwarded_allow_ips: ['127.0.0.1']\n",
'5:35:57 PM web.1 | accesslog: -\n',
'5:35:57 PM web.1 | disable_redirect_access_to_syslog: False\n',
'5:35:57 PM web.1 | access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"\n',
'5:35:57 PM web.1 | errorlog: -\n',
'5:35:57 PM web.1 | loglevel: debug\n',
'5:35:57 PM web.1 | capture_output: False\n',
'5:35:57 PM web.1 | logger_class: gunicorn.glogging.Logger\n',
'5:35:57 PM web.1 | logconfig: None\n',
'5:35:57 PM web.1 | logconfig_dict: {}\n',
'5:35:57 PM web.1 | syslog_addr: udp://localhost:514\n',
'5:35:57 PM web.1 | syslog: False\n',
'5:35:57 PM web.1 | syslog_prefix: None\n',
'5:35:57 PM web.1 | syslog_facility: user\n',
'5:35:57 PM web.1 | enable_stdio_inheritance: False\n',
'5:35:57 PM web.1 | statsd_host: None\n',
'5:35:57 PM web.1 | statsd_prefix: \n',
'5:35:57 PM web.1 | proc_name: dallinger_experiment_server\n',
'5:35:57 PM web.1 | default_proc_name: gunicorn\n',
'5:35:57 PM web.1 | pythonpath: None\n',
'5:35:57 PM web.1 | paste: None\n',
'5:35:57 PM web.1 | on_starting: <function OnStarting.on_starting at 0x7f1e9ba2c280>\n',
'5:35:57 PM web.1 | on_reload: <function OnReload.on_reload at 0x7f1e9ba2c3a0>\n',
'5:35:57 PM web.1 | when_ready: <function when_ready at 0x7f1ecf76f670>\n',
'5:35:57 PM web.1 | pre_fork: <function Prefork.pre_fork at 0x7f1e9ba2c5e0>\n',
'5:35:57 PM web.1 | post_fork: <function Postfork.post_fork at 0x7f1e9ba2c700>\n',
'5:35:57 PM web.1 | post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f1e9ba2c820>\n',
'5:35:57 PM web.1 | worker_int: <function WorkerInt.worker_int at 0x7f1e9ba2c940>\n',
'5:35:57 PM web.1 | worker_abort: <function WorkerAbort.worker_abort at 0x7f1e9ba2ca60>\n',
'5:35:57 PM web.1 | pre_exec: <function PreExec.pre_exec at 0x7f1e9ba2cb80>',
'5:35:57 PM web.1 | pre_request: <function PreRequest.pre_request at 0x7f1e9ba2cca0>',
'5:35:57 PM web.1 | post_request: <function PostRequest.post_request at 0x7f1e9ba2cd30>\n',
'5:35:57 PM web.1 | child_exit: <function ChildExit.child_exit at 0x7f1e9ba2ce50>',
'5:35:57 PM web.1 | worker_exit: <function WorkerExit.worker_exit at 0x7f1e9ba2cf70>',
'5:35:57 PM web.1 | nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f1e9ba410d0>\n', '5:35:57 PM web.1 | on_exit: <function OnExit.on_exit at 0x7f1e9ba411f0>\n',
'5:35:57 PM web.1 | proxy_protocol: False',
"5:35:57 PM web.1 | proxy_allow_ips: ['127.0.0.1']",
'5:35:57 PM web.1 | keyfile: None\n',
'5:35:57 PM web.1 | certfile: None\n',
'5:35:57 PM web.1 | ssl_version: 2',
'5:35:57 PM web.1 | cert_reqs: 0\n',
'5:35:57 PM web.1 | ca_certs: None\n',
'5:35:57 PM web.1 | suppress_ragged_eofs: True',
'5:35:57 PM web.1 | do_handshake_on_connect: False\n',
'5:35:57 PM web.1 | ciphers: TLSv1\n',
'5:35:57 PM web.1 | raw_paste_global_conf: []\n',
'5:35:57 PM web.1 | [2021-01-19 17:35:57 +0100] [793381] [INFO] Starting gunicorn 19.9.0\n',
'5:35:57 PM web.1 | [2021-01-19 17:35:57 +0100] [793381] [DEBUG] Arbiter booted\n',
'5:35:57 PM web.1 | [2021-01-19 17:35:57 +0100] [793381] [INFO] Listening at: http://0.0.0.0:5000 (793381)\n',
'5:35:57 PM web.1 | [2021-01-19 17:35:57 +0100] [793381] [INFO] Using worker: geventwebsocket.gunicorn.workers.GeventWebSocketWorker\n',
'5:35:57 PM web.1 | WARNING:/home/frank/projects/Dallinger/dallinger/experiment_server/gunicorn.py:Ready.\n',
"5:35:57 PM web.1 | /usr/lib/python3.8/os.py:1023: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used\n",
'5:35:57 PM web.1 | return io.open(fd, *args, **kwargs)\n',
'5:35:57 PM web.1 | [2021-01-19 17:35:57 +0100] [793418] [INFO] Booting worker with pid: 793418\n',
'5:35:57 PM web.1 | [2021-01-19 17:35:57 +0100] [793418] [ERROR] Exception in worker process\n']
The proposal of this issue is to facilitate debugging such messages. @alecpm gave some nice insight here:
Generally what I do in this case is look in the output for the temp directory created for the experiment. You should be able to cd into that directory and run dallinger_heroku_web or dallinger_heroku_worker if that’s the process that’s failing. Those will generally give you good tracebacks and point you to the actual cause. You will need to make sure the shell you are using in the tmp directory has the same python you used to run Dallinger, and you’ll probably also need to explicitly do “export PORT=5000” before running those commands.
One approach would just be to put this information in the Dallinger documentation. A more exciting possibility would be to add a better wrapper for catching/debugging these messages. It seems like it shouldn’t be too hard to wrap the app launch call in a try except
and log any errors before reraising, but the developers must have considered this already? Still, I hope we could find a way of facilitating this debugging because otherwise the errors are quite hard to identify.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Hi @alecpm, apologies for this, it must have been frustrating to try and debug this without a reproducible example!
I am not entirely sure on the best way to reproduce these errors. The last time I saw it, it was an older version of Dallinger, potentially linked to some route registration problems we had earlier.
Here’s one way to get the error:
issues/2432-extra-params
v.1.10.0
(here’s the repository)cd demos/timeline
dallinger debug --verbose
You get an error like this:
That sounds like some masterful debugging! Thank you!