2025, Nov 12 21:00
Why Embedded Python Hangs in a C++ Taskflow: GIL Ownership, PyGILState_Ensure, and the Minimal Fix
Learn why embedded Python in a C++ Taskflow graph hangs on Windows: GIL ownership. See the correct PyEval_SaveThread and PyGILState_Ensure pattern to fix it.
Python embedded in a C++ task graph: why it hangs and how to fix it
Running embedded Python inside a concurrent C++ task graph looks straightforward until it suddenly freezes. On Windows 10 with MSVC and Python 3.13.5, initializing the interpreter succeeds, but a task that touches Python never returns. The process stops reacting and even Ctrl+C does nothing. The root cause is subtle but well-defined: how the GIL is handled across threads.
Problem setup
The following program initializes Python, wires up a couple of paths, and schedules a few tasks using Taskflow. One of the tasks acquires the GIL to run Python-bound work. The application hangs when that task runs.
#include <iostream>
#include <taskflow/taskflow.hpp>
#define PY_SSIZE_T_CLEAN
#include <Python.h>
int main(int argc, char* argv[]) {
wchar_t venvRoot[] = L".venv";
Py_SetPythonHome(venvRoot);
Py_Initialize();
PyEval_InitThreads();
if (!Py_IsInitialized()) {
std::cerr << "Python failed to initialize\n";
return 1;
}
PyRun_SimpleString(
"import sys\n"
"sys.path.insert(0, '.venv/Lib')\n"
"sys.path.insert(0, '.venv/Lib/site-packages')\n"
);
PyRun_SimpleString(
"from time import time, ctime\n"
"print('Today is', ctime(time()))\n"
);
PyObject* py_main_mod = PyImport_AddModule("__main__");
PyObject* globals_dict = PyModule_GetDict(py_main_mod);
tf::Executor runPool;
tf::Taskflow graph;
auto [T1, T2, T3, T4] = graph.emplace(
[] () { std::cout << "TaskA\n"; PyGILState_STATE s = PyGILState_Ensure(); PyGILState_Release(s); },
[] () { std::cout << "TaskB\n"; },
[] () { std::cout << "TaskC\n"; },
[] () { std::cout << "TaskD\n"; }
);
T1.precede(T2, T3);
T4.succeed(T2, T3);
runPool.run(graph).wait();
if (Py_FinalizeEx() < 0) {
return 120;
}
return 0;
}
What is actually going wrong
The embedded interpreter starts with the main thread owning the GIL. When a worker thread later calls PyGILState_Ensure, it must acquire the GIL to proceed. If the main thread never released its initial ownership, the worker cannot acquire it and the program stalls. Each time you acquire the GIL you must release it, and after initialization you must reset the initial GIL state so background threads can make progress.
In other words, after initializing Python, the main thread should explicitly release the initial GIL with PyEval_SaveThread. Then, any thread that needs to run Python code temporarily acquires the GIL using PyGILState_Ensure and releases it with PyGILState_Release. Before finalizing the interpreter, the main thread should restore its thread state with PyEval_RestoreThread.
Fixing the hang
The correction is minimal: release the initial GIL after initialization, acquire and release it around Python work in tasks, and restore it before finalization. PyEval_InitThreads has an empty body since Python 3.9 and does not change this logic.
#include <iostream>
#include <taskflow/taskflow.hpp>
#define PY_SSIZE_T_CLEAN
#include <Python.h>
int main(int argc, char* argv[]) {
wchar_t venvRoot[] = L".venv";
Py_SetPythonHome(venvRoot);
Py_Initialize();
PyEval_InitThreads();
if (!Py_IsInitialized()) {
std::cerr << "Python failed to initialize\n";
return 1;
}
PyRun_SimpleString(
"import sys\n"
"sys.path.insert(0, '.venv/Lib')\n"
"sys.path.insert(0, '.venv/Lib/site-packages')\n"
);
PyRun_SimpleString(
"from time import time, ctime\n"
"print('Today is', ctime(time()))\n"
);
PyObject* py_main_mod = PyImport_AddModule("__main__");
PyObject* globals_dict = PyModule_GetDict(py_main_mod);
PyThreadState* main_state = PyEval_SaveThread();
tf::Executor runPool;
tf::Taskflow graph;
auto [T1, T2, T3, T4] = graph.emplace(
[] () { std::cout << "TaskA\n"; PyGILState_STATE g = PyGILState_Ensure(); PyGILState_Release(g); },
[] () { std::cout << "TaskB\n"; },
[] () { std::cout << "TaskC\n"; },
[] () { std::cout << "TaskD\n"; }
);
T1.precede(T2, T3);
T4.succeed(T2, T3);
runPool.run(graph).wait();
PyEval_RestoreThread(main_state);
if (Py_FinalizeEx() < 0) {
return 120;
}
return 0;
}
There is also a scoped alternative for sections that must temporarily release the GIL on the same thread, Py_BEGIN_ALLOW_THREADS, which maps to PyEval_SaveThread under the hood and re-acquires later in the block.
Related options you may consider
Multiple interpreters in a single process are possible through the subinterpreters API. This is similar in spirit to the multiprocessing module but stays in one process; data passed between interpreters needs to be pickled and unpickled.
Python 3.13 offers a GIL-Free build. It removes the mutex behavior, although acquiring is still needed to set up thread-local state. This approach only works with compatible libraries; if you cannot control dependencies, acquire and release the GIL in each task.
Why this detail matters
GIL management defines whether your embedded runtime cooperates with your C++ scheduler. If the main thread keeps the GIL, background tasks will block and your executor appears stuck. Making GIL ownership explicit avoids deadlocks, enables predictable task execution, and lets you reason about where Python can safely run. If behavior is unclear, reducing the setup to a minimal reproducible example with just two threads and the embedding calls makes troubleshooting straightforward.
Takeaways
Initialize Python, then immediately release the initial GIL with PyEval_SaveThread. In every worker that interacts with Python, wrap Python-bound sections with PyGILState_Ensure and PyGILState_Release. Before shutting down, restore the thread state with PyEval_RestoreThread and call Py_FinalizeEx. If you need more isolation, look into subinterpreters. If you can adopt a compatible stack on Python 3.13, the GIL-Free build can remove the mutex bottleneck, but when that is not an option, disciplined acquire-and-release around each task is the reliable path.