2026, Jan 04 15:00

Why Flask Apps on Gunicorn Hang: Single Sync Worker Deadlocks on Internal Loopback Requests

Learn why Flask apps on Gunicorn hang with a single sync worker: internal HTTP loopback requests deadlock and time out. Fix it fast: run 2+ workers. Now.

When a Flask app is served by gunicorn with a single sync worker, an endpoint that makes an HTTP call back into the same process can hang until a worker timeout. Direct requests work, but an internal loopback HTTP call stalls. This guide explains why that happens and how to resolve it in a way that aligns with gunicorn’s worker model.

Minimal example that reproduces the issue

The structure is straightforward.

% tree your_flask_app
your_flask_app
├── __init__.py
├── app.py
└── routes
    ├── __init__.py
    ├── data.py
    └── shared.py

your_flask_app/app.py

from flask import Flask
from routes.data import api_bp

site = Flask(__name__)

# Register blueprint
site.register_blueprint(api_bp, url_prefix='/')

if __name__ == "__main__":
    site.run(debug=True)

your_flask_app/routes/shared.py

# shared dictionary for storing session data
state_cache = {}

your_flask_app/routes/data.py

from flask import Blueprint, jsonify, request
import uuid
import requests
from routes.shared import state_cache

api_bp = Blueprint('svc', __name__)

@api_bp.route('/hello_world', methods=['POST'])
def say_hello():
    # Generate a new UUID and store sample data
    sid = str(uuid.uuid4())
    payload = {"message": "Hello, World!", "session_id": sid}
    state_cache[sid] = payload
    return jsonify({"session_id": sid}), 201

@api_bp.route('/get_hello_world', methods=['GET'])
def fetch_hello():
    # Get session_id from query string
    sid = request.args.get('session_id')
    if not sid or sid not in state_cache:
        return jsonify({"error": "Invalid or missing session_id"}), 404
    return jsonify(state_cache[sid]), 200

@api_bp.route('/call_hello_world', methods=['POST'])
def invoke_hello():
    # Make API call to /hello_world endpoint on localhost
    try:
        resp = requests.post('http://localhost:5000/hello_world')
        return (resp.content, resp.status_code, resp.headers.items())
    except Exception as e:
        return jsonify({"error": str(e)}), 500

With one gunicorn worker, direct POST /hello_world and GET /get_hello_world succeed. POST /call_hello_world, which internally POSTs to /hello_world on localhost, times out and triggers a worker timeout in the logs.

What’s actually going wrong

The problem is scheduling, not HTTP semantics. The client asks gunicorn for a response. Gunicorn routes that request to the only available worker. Before returning a response, the handler issues another HTTP request back to the same service. Gunicorn tries to dispatch that second request, but there are no idle workers left to handle it. No response can be produced until the first handler returns, and the first handler cannot return because it is waiting on the second handler. The system deadlocks until gunicorn’s worker timeout is reached.

When you increase the worker count to two or more, at least one worker remains idle while the first worker is blocked waiting on the internal HTTP call. Gunicorn can then schedule the nested request on that idle worker, and both requests complete successfully.

Solution

Run the app with at least two gunicorn workers so there is an idle worker to serve re-entrant HTTP calls to the same process.

./venv/bin/gunicorn -w 2 -b 0.0.0.0:5000 app:site

This aligns with the observed behavior: with 2+ workers, the internal POST to /hello_world proceeds without blocking the original request.

Why this matters

It is easy to introduce an internal HTTP call and not account for the worker model. A single sync worker will serialize all work, and a same-process loopback call creates a dependency cycle that cannot be resolved without additional capacity. Understanding how gunicorn schedules requests helps avoid hidden deadlocks and timeouts in production when endpoints call back into the service.

Conclusion

If an endpoint needs to make an HTTP request to the same service, ensure gunicorn has more than one worker so at least one remains idle to process the nested request. With one worker, the initial request holds the only worker while waiting, causing a timeout. Planning worker counts with this access pattern in mind prevents avoidable outages and keeps response times predictable.