2025, Nov 11 19:00

Call GSL from Pure Python with Numba: cacheable, ctypes-free ExternalFunction linking and tuple-based gsl_function params

Learn a ctypes-free way to call GSL from Python with Numba and llvmlite: cacheable ExternalFunction linking and tuple ABI packing for gsl_function callbacks.

Wrapping GSL from pure Python and keeping Numba’s cacheability usually collide at two points: calling into a shared library without ctypes/cffi, and passing structured state to GSL callbacks without bespoke C glue. Below is a compact path that addresses both. It shows how to call GSL directly with llvmlite and numba.types.ExternalFunction and how to package arguments for gsl_function by lowering a Numba Tuple to the same ABI as a simple C struct.

Problem overview

The goal is to wrap selected GSL routines, avoid handwritten C glue where possible, and keep the result cacheable with njit(cache=True). A minimal layout often starts like this and uses ctypes to load a tiny C shim:

// gsl_bridge.c
#include <gsl/gsl_integration.h>
#include <stdint.h>
typedef double (*gsl_cb_t)(double x, void* params);
double qag_bridge(
        gsl_cb_t cb,
        void* user_state,
        int which,
        double a,
        double b,
        double epsabs,
        double epsrel,
        int limit,
        int key) {
    gsl_integration_workspace* work = gsl_integration_workspace_alloc(limit);
    double res, err;
    gsl_function fn_proxy;
    fn_proxy.function = cb;
    fn_proxy.params = user_state;
    gsl_integration_qag(&fn_proxy, a, b, epsabs, epsrel, limit, key, work, &res, &err);
    gsl_integration_workspace_free(work);
    return res;
}
# integration.py
import ctypes as ct
from pathlib import Path
from numba import njit
def find_ext(lib_base: str) -> str:
    here = Path(__file__).parent
    patt = f"*{lib_base}.*"
    found = here.glob(patt)
    try:
        return str(next(found))
    except StopIteration:
        return None
_so_path = find_ext('gsl_integration')
_so_handle = ct.CDLL(_so_path)
qag_fn = _so_handle.qag
qag_fn.argtypes = [ct.c_void_p, ct.c_void_p,
                   ct.c_double, ct.c_double,
                   ct.c_double, ct.c_double,
                   ct.c_int, ct.c_int]
qag_fn.restype = ct.c_double
@njit
def qag_jit(cb_addr: int,
            a: float,
            b: float,
            epsabs: float = 1.49e-08,
            epsrel: float = 1.49e-08,
            limit: int = 50,
            key: int = 1,
            user_ptr: int = 0) -> float:
    return qag_fn(cb_addr, user_ptr, a, b, epsabs, epsrel, limit, key)
# setup.py
from setuptools import setup, Extension, find_packages
ext_mod = Extension(
    "numba_gsl.gsl_integration",
    sources=["numba_gsl/gsl_integration.c"],
    libraries=["gsl", "gslcblas"],
    extra_compile_args=["-O3"],
)
setup(
    name="numba_gsl",
    version="0.1.0",
    description="Numba-friendly wrappers for GSL routines",
    author="No Name",
    packages=find_packages(),
    ext_modules=[ext_mod],
    install_requires=["numba", "llvmlite"],
    python_requires=">=3.12",
)
# demo.py
import math
from numba import cfunc, types
from numba_gsl.integration import qag_jit
from scipy.integrate import quad
@cfunc(types.float64(types.float64, types.voidptr))
def sinc_cb(x, _):
    return math.sin(x) / x if x != 0.0 else 1.0
def py_sinc(x):
    return math.sin(x) / x if x != 0.0 else 1.0
addr = sinc_cb.address
res_numba = qag_jit(addr, a=1e-8, b=3.14)
res_scipy = quad(py_sinc, a=1e-8, b=3.14)[0]
print("numba_gsl quad result:", res_numba)
print("scipy     quad result:", res_scipy)

While the approach works for simple cases, it relies on ctypes. That blocks njit(cache=True), and it leaves complex GSL structs like gsl_function and workspace to be handled on the Python side or via ad-hoc C shims, which limits extensibility.

What causes the friction

Two independent issues surface. First, calling into a DLL/SO from Numba in a way that is cacheable. This is solved by loading the library once via llvmlite.binding.load_library_permanently and binding symbols through numba.types.ExternalFunction instead of ctypes or cffi. Second, providing GSL with a callback and a parameter block that looks like gsl_function on the C side. Numba’s Tuple lowers to the same LLVM representation as a C struct. That means packing a Tuple as a tiny struct and casting it to void* can represent the gsl_function payload. This must be done carefully because C compilers can choose different ABI layouts for complicated structs. For simple shapes like a pair of pointers, this tends to be stable across platforms.

Solution: cacheable linking and ABI-friendly tuple lowering

The following pieces rely on three ideas. Load the vendor library only once into the process, bind C symbols as ExternalFunction, and bridge values to pointers and back via tiny intrinsics. A helper converts a Tuple into a pointer-shaped blob so GSL can treat it as the params field of gsl_function.

from numba import types, typeof, njit, cfunc
from numba.extending import intrinsic
from numba.core import cgutils
import numba as nb
import numpy as np
@intrinsic
def stack_ptr_from_val(typingctx, item):
    def impl(context, builder, signature, args):
        tmp = cgutils.alloca_once_value(builder, args[0])
        return tmp
    sig = types.CPointer(typeof(item).instance_type)(typeof(item).instance_type)
    return sig, impl
@intrinsic
def deref_ptr(typingctx, arrptr):
    def impl(context, builder, signature, args):
        val = builder.load(args[0])
        return val
    sig = arrptr.dtype(types.CPointer(arrptr.dtype))
    return sig, impl
@intrinsic
def sizeof_tuple_abi(typingctx, tup_t):
    if not isinstance(tup_t, types.Tuple):
        raise TypeError("Argument must be a tuple")
    def codegen(context, builder, sig, args):
        ty = context.get_value_type(tup_t)
        base_ptr = cgutils.alloca_once(builder, ty)
        next_ptr = builder.gep(base_ptr, [context.get_constant(types.int32, 1)])
        intptr = context.get_value_type(types.intp)
        base = builder.ptrtoint(base_ptr, intptr)
        nxt = builder.ptrtoint(next_ptr, intptr)
        sz = builder.sub(nxt, base)
        return sz
    sig = types.intp(tup_t)
    return sig, codegen
@intrinsic
def tuple_as_voidptr(typingctx, tup_t):
    if not isinstance(tup_t, types.Tuple):
        raise TypeError("Argument must be a tuple")
    sig = types.voidptr(tup_t)
    def codegen(context, builder, signature, args):
        (val,) = args
        slot = cgutils.alloca_once_value(builder, val)
        void_ty = context.get_value_type(types.voidptr)
        casted = builder.bitcast(slot, void_ty)
        return casted
    return sig, codegen
def make_cfunc_and_sig(fn, arg_pack, cache=False):
    if not callable(fn) and not isinstance(fn, nb.core.registry.CPUDispatcher):
        raise TypeError("fn must be callable")
    if not isinstance(arg_pack, tuple):
        raise TypeError("arg_pack must be a tuple")
    if isinstance(fn, nb.core.registry.CPUDispatcher):
        fn = fn.py_func
    nb_inner = nb.njit(types.double(types.double, nb.typeof(arg_pack)), inline="always", cache=cache)(fn)
    sig = types.double(types.double, types.CPointer(nb.typeof(arg_pack)))
    @cfunc(sig, error_model="numpy", cache=True)
    def cfun(x, pack_ptr):
        return nb_inner(x, pack_ptr[0])
    return cfun, sig
def make_cfunc_with_blob(fn, arg_pack, cache=False):
    if not callable(fn) and not isinstance(fn, nb.core.registry.CPUDispatcher):
        raise TypeError("fn must be callable")
    if not isinstance(arg_pack, tuple):
        raise TypeError("arg_pack must be a tuple")
    if isinstance(fn, nb.core.registry.CPUDispatcher):
        fn = fn.py_func
    @njit(cache=cache)
    def lower_tuple_blob(tup):
        size = sizeof_tuple_abi(tup)
        blob_ptr = tuple_as_voidptr(tup)
        view = nb.carray(blob_ptr, (size,), dtype=np.uint8)
        copy = np.copy(view)
        return copy
    nb_inner = nb.njit(types.double(types.double, nb.typeof(arg_pack)), inline="always", cache=cache)(fn)
    @cfunc(types.double(types.double, types.CPointer(nb.typeof(arg_pack))), error_model="numpy", cache=cache)
    def cfun(x, pack_ptr):
        return nb_inner(x, pack_ptr[0])
    return cfun, lower_tuple_blob
@njit(inline="always")
def pack_gsl_fun(cb_addr, arg_pack):
    arg_ptr = tuple_as_voidptr(arg_pack)
    return tuple_as_voidptr((cb_addr, arg_ptr))

With the intrinsics in place, bind directly to GSL using load_library_permanently and ExternalFunction. This enables caching and avoids ctypes entirely.

from llvmlite import binding
from numba import types
import numba as nb
import gsl_intrinsics as gi
binding.load_library_permanently(r'path_to_gsl.dll')
f64   = types.double
f64_p = types.CPointer(f64)
i32   = types.intc
sz_t  = types.size_t
vptr  = types.voidptr
name_qag = 'gsl_integration_qag'
sig_qag = i32(
    vptr,   # gsl_function*
    f64,    # a
    f64,    # b
    f64,    # epsabs
    f64,    # epsrel
    sz_t,   # limit
    i32,    # key
    vptr,   # workspace*
    f64_p,  # result*
    f64_p   # abserr*
)
qag_ext = types.ExternalFunction(name_qag, sig_qag)
name_alloc = 'gsl_integration_workspace_alloc'
sig_alloc = vptr(sz_t,)
ws_alloc = types.ExternalFunction(name_alloc, sig_alloc)
name_free = 'gsl_integration_workspace_free'
sig_free = types.none(vptr,)
ws_free = types.ExternalFunction(name_free, sig_free)
@nb.njit(cache=True)
def call_gsl_qag(cb_addr, arg_pack, a, b, epsabs, epsrel, limit, key):
    params_ptr = gi.tuple_as_voidptr(arg_pack)
    gsl_fn = gi.tuple_as_voidptr((cb_addr, params_ptr))
    work = ws_alloc(limit)
    res_ptr = gi.stack_ptr_from_val(nb.double(0))
    err_ptr = gi.stack_ptr_from_val(nb.double(0))
    _ret = qag_ext(
        gsl_fn, a, b,
        epsabs, epsrel, limit,
        key, work,
        res_ptr, err_ptr,
    )
    ws_free(work)
    return gi.deref_ptr(res_ptr), gi.deref_ptr(err_ptr)

The usage mirrors the ctypes-based example but runs entirely through Numba’s external call path and can be cached. For illustration, one variant builds a cfunc from a njit function and a concrete tuple of arguments; another writes the cfunc signature explicitly.

import numba as nb
from numba import types
import numpy as np
import gsl_intrinsics as gi
import math
arg_tuple = (5.0, np.ones((3, 3)), 8.0)
@nb.njit()
def sinc_plus_bias(x, pack):
    return (math.sin(x) / x + pack[2]) if x != 0.0 else 1.0
cfun_auto, sig_auto = make_cfunc_and_sig(sinc_plus_bias, arg_tuple, cache=True)
inner = (types.float64, types.Array(types.float64, 2, 'C'), types.float64)
@nb.cfunc(types.double(types.double, types.CPointer(types.Tuple(inner))), cache=True)
def cfun_manual(x, p):
    return sinc_plus_bias(x, p[0])
a = 1e-8
b = 3.14
epsabs = 1.49e-08
epsrel = 1.49e-08
limit = 50
key = 1
print(call_gsl_qag(cfun_manual.address, arg_tuple, a, b, epsabs, epsrel, limit, key))
print("Cache hits call_gsl_qag:", call_gsl_qag.stats.cache_hits)
print("Cache misses call_gsl_qag:", call_gsl_qag.stats.cache_misses)
print("")
print("Cache hits cfun_manual:", cfun_manual.cache_hits)

Why the trick with tuples works

Numba lowers a Tuple to an LLVM struct, and simple C structs lower to LLVM structs as well. When the struct layout is uncomplicated—think a pair of pointers—the ABI chosen by mainstream C compilers typically matches what’s needed at the LLVM level. Casting a Tuple to void* yields a pointer that GSL can keep as gsl_function.params. The function pointer itself is passed alongside and packed the same way. The direct ExternalFunction binding ensures Numba can treat the call like any other compiled external and cache the resulting machine code.

Why this is worth knowing

It enables cacheable, njit-heavy pipelines that call into vendor libraries without the ctypes/cffi tax and without hand-maintaining glue for every struct definition. It also keeps the wrapping code extensible: once the tuple-packing and pointer helpers are in place, adding more GSL calls follows the same pattern. There is an important caveat. While simple tuple-as-struct shapes are generally robust, complex C structs can be compiler- and platform-dependent, so expanding the technique beyond trivial layouts should be done deliberately.

Takeaways

If you need Numba caching, do not call through ctypes or cffi. Load the shared object once with llvmlite.binding.load_library_permanently and bind functions via numba.types.ExternalFunction. For passing state into GSL callbacks, lower a Numba Tuple to a struct-like blob and cast it to void*. Keep the struct shape simple to avoid ABI pitfalls. If you prefer a more traditional route, generating low-level interfaces with Cython is also viable. For cases that require rich C struct manipulation, it can reduce Python-side boilerplate and is a pragmatic choice.

The article is based on a question from StackOverflow by Olibarer and an answer by max9111.