question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

anomaly in worker_per_core in kubernetes cluster

See original GitHub issue

Hello, first of all, I would like to thank for an awesome project that @tiangolo made and my sincere thanks for its contributors. As a team, we decided our microservices to be on FastAPi. It is great to say more than 15 applications are ready and we are heading to a difficult road currently I mean we are on deployment stage. So far everything works perfectly until deploying applications on Kubernetes.

We tried to read all the issues related to this topic, but unfortunately, we are still stuck with errors. Looking for hope from an experienced person to help us in this case.

For deploying we are using @tiangolo’s tiangolo/uvicorn-gunicorn-fastapi:python3.7, below at first our dockerfile was looked like this.

FROM  tiangolo/uvicorn-gunicorn-fastapi:python3.7

COPY . /app

WORKDIR /app
ENV settings=prod
WORKERS_PER_CORE=2
 
RUN apt-get update -y &&  pip install --upgrade pip &&  \
    pip install -r requirements.txt && \
    apt-get install -y postgresql-client

The problem occurs after successfully running pod, workers booting itself like each 2-3 seconds and HTTP request is not able to handle by the application. Attaching the logs file below. It is weird. As the application did not start it raises error from Gino, Gino engine is not initialized. While the application is not yet ready requests comes in at this moment worker boots itself, totally weird. As ORM we use gino to make queries.

Gino errors as well.

[2020-04-10 10:31:48 +0000] [124] [INFO] Waiting for application startup.
{"loglevel": "info", "workers": 8, "bind": "0.0.0.0:80", "workers_per_core": 2.0, "host": "0.0.0.0", "port": "80"}                                                                                                
10.44.6.8
10.44.6.8:56568 - "POST /token/generator/default HTTP/1.1" 500
[2020-04-10 10:31:48 +0000] [127] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi                                                                                                   
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__                                                                                                         
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/fastapi/applications.py", line 140, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 134, in __call__
    await self.error_middleware(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 178, in __call__
    raise exc from None
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 156, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/usr/local/lib/python3.7/site-packages/starlette_prometheus/middleware.py", line 47, in dispatch
    raise e from None
  File "/usr/local/lib/python3.7/site-packages/starlette_prometheus/middleware.py", line 43, in dispatch
    response = await call_next(request)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/cors.py", line 76, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/app/app/main.py", line 63, in dispatch
    response = await call_next(request)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/gino/ext/starlette.py", line 72, in __call__
    scope['connection'] = await self.db.acquire(lazy=True)
  File "/usr/local/lib/python3.7/site-packages/gino/api.py", line 520, in acquire
    return self.bind.acquire(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gino/api.py", line 540, in __getattribute__
    raise self._exception
gino.exceptions.UninitializedError: Gino engine is not initialized.
[2020-04-10 10:31:49 +0000] [127] [INFO] Application startup complete.
[2020-04-10 10:31:49 +0000] [125] [INFO] Started server process [125]
[2020-04-10 10:31:49 +0000] [125] [INFO] Waiting for application startup.
[2020-04-10 10:31:50 +0000] [126] [INFO] Application startup complete.

We thought to change timeout as some people suggested in some issues. so we did overwrite gunicorn file as well. But still, it was not helpful. We did overwrite strat.sh and gunicorn.conf file and added WORKERS_PER_CORE. gunicorn.conf:

import json
import multiprocessing
import os

workers_per_core_str = os.getenv("WORKERS_PER_CORE", "1")
web_concurrency_str = os.getenv("WEB_CONCURRENCY", None)
host = os.getenv("HOST", "0.0.0.0")
port = os.getenv("PORT", "80")
bind_env = os.getenv("BIND", None)
use_loglevel = os.getenv("LOG_LEVEL", "info")
if bind_env:
    use_bind = bind_env
else:
    use_bind = f"{host}:{port}"

cores = multiprocessing.cpu_count()
workers_per_core = float(workers_per_core_str)
default_web_concurrency = workers_per_core * cores
if web_concurrency_str:
    web_concurrency = int(web_concurrency_str)
    assert web_concurrency > 0
else:
    web_concurrency = max(int(default_web_concurrency), 2)

# Gunicorn config variables
loglevel = use_loglevel
workers = web_concurrency
bind = use_bind
keepalive = 120
errorlog = "-"
timeout=300

# For debugging and testing
log_data = {
    "loglevel": loglevel,
    "workers": workers,
    "bind": bind,
    # Additional, non-gunicorn variables
    "workers_per_core": workers_per_core,
    "host": host,
    "port": port,
}
print(json.dumps(log_data))

Dockerfile:

FROM  tiangolo/uvicorn-gunicorn-fastapi:python3.7

COPY . /app
COPY start.sh /
COPY gunicorn_conf.py /
WORKDIR /app

ENV settings=prod
#ENV WORKERS_PER_CORE=2
WEB_CONCURRENCY=1

RUN apt-get update -y &&  pip install --upgrade pip &&  \
    pip install -r requirements.txt && \
    apt-get install -y postgresql-client

And if we comment WORKERS_PER_CORE=2 and instead write ENV WEB_CONCURRENCY=1 app behaves normally.

Any suggestions, thanks in advance! photo_2020-04-10_15-40-45

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Turallcommented, Jun 10, 2020

Check using WEB_CONCURRENCY=2 instead of WORKER_PER_CORE=2. That would help to possibly discard if the problem is that you have lots of cores and there are too many app processes running or if it has to do with exactly more than one process running.

Also, have in mind that PostgreSQL accepts a max number of connections (I think 100). If it starts more than that number of processes, that will create at least those many connections and crash.

thank you for your quick response. We will test this configuration and we will be back soon. Thanks a lot for help our FastAPI community.

0reactions
github-actions[bot]commented, Jun 21, 2020

Assuming the original issue was solved, it will be automatically closed now. But feel free to add more comments or create new issues.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Watchdog Detects Kubernetes Anomalies and Surfaces Root ...
Watchdog automatically maps the services affected by an issue to locate the root cause and detects anomalies within your Kubernetes ...
Read more >
Learning State Machines to Monitor and Detect Anomalies on ...
Learning State Machines to Monitor and Detect Anomalies on a Kubernetes Cluster. Authors:Clinton Cao, Agathe Blaise, Sicco Verwer, Filippo ...
Read more >
Automate Container Anomaly Monitoring of Amazon Elastic ...
In this post, we will demonstrate the new Amazon DevOps Guru features around cluster grouping and additionally supported Amazon EKS metrics. To ...
Read more >
Learning State Machines to Monitor and ... - ACM Digital Library
From our experiment results, our approach can detect the attacks very well, achieving a balanced accuracy of 99.2% and a F1 score of...
Read more >
Kubernetes Components
A Kubernetes cluster consists of the components that are a part of the control plane and a set of machines called nodes.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found