2025, Dec 08 11:00

How to mix CVXPY with NumPy safely: avoid embedding variables in arrays and use a selector matrix

Learn why embedding CVXPY variables into NumPy arrays causes type errors, and how a selector matrix fixes it. Practical guide with constraints and NaN handling.

When mixing NumPy arrays with CVXPY, a common trap is trying to embed CVXPY variables directly into a NumPy array and then using that array in constraints and objectives. This looks natural but fails at runtime with type and shape errors. Below is a minimal, IT-friendly walkthrough of what goes wrong and how to restructure the code so the optimization runs cleanly.

Reproducing the issue

The goal is to treat NaNs as unknowns, optimize those entries with nonnegativity constraints, and enforce bounds on the product A @ x. The following snippet illustrates the pattern that leads to the error.

import cvxpy as cp
import numpy as np

def solve_vector_naive(M, seed_vals):
    seed_vals = np.array(seed_vals, dtype=float)
    miss_mask = np.isnan(seed_vals)
    z_free = cp.Variable(np.sum(miss_mask), nonneg=True)
    x_all = np.copy(seed_vals)
    idx_free = np.where(miss_mask)[0]
    for i, u in enumerate(idx_free):
        x_all[u] = z_free[i]
    y = M @ x_all
    cons = [y >= 1.0, y <= 2.0]
    obj = cp.Minimize(cp.sum(z_free))
    prob = cp.Problem(obj, cons)
    prob.solve()
    if prob.status != cp.OPTIMAL:
        raise ValueError(f"Optimization failed: {prob.status}")
    out = np.copy(x_all)
    for i, u in enumerate(idx_free):
        out[u] = z_free.value[i]
    return out, y.value

This approach often crashes with messages similar to “ValueError: setting an array element with a sequence” and “TypeError: float() argument must be a string or a real number, not 'index'”.

What actually breaks

A NumPy ndarray with dtype float expects concrete numerical values. A CVXPY variable or expression is symbolic until the problem is solved. Assigning z_free[i] into a NumPy array slot forces NumPy to coerce a symbolic expression into a float, which it cannot do. That is the root cause of the reported ValueError and TypeError.

The fix

Instead of inserting CVXPY variables into a NumPy array, build a selection matrix that maps the compact vector of unknowns to the full-sized vector. Keep the known entries as numeric data, replace their NaNs with zeros, and form the decision-dependent vector using linear algebra. This keeps all symbolic pieces in CVXPY’s expression graph and all numeric pieces in NumPy, without mixing them at the element level.

import cvxpy as cp
import numpy as np

def solve_vector_with_selector(M, seed_vals):
    seed_vals = np.array(seed_vals, dtype=float)
    miss_mask = np.isnan(seed_vals)
    z_free = cp.Variable(int(np.sum(miss_mask)), nonneg=True)
    x_base = np.copy(seed_vals)
    idx_free = np.where(miss_mask)[0]
    S = np.zeros((len(x_base), z_free.shape[0]))
    for i, u in enumerate(idx_free):
        S[u, i] = 1.0
    x_base[np.isnan(x_base)] = 0.0
    y = M @ (x_base + S @ z_free)
    cons = [y >= 1.0, y <= 2.0]
    obj = cp.Minimize(cp.sum(z_free))
    prob = cp.Problem(obj, cons)
    prob.solve()
    if prob.status != cp.OPTIMAL:
        raise ValueError(f"Optimization failed: {prob.status}")
    x_full_val = x_base + S @ z_free.value
    return np.asarray(x_full_val).ravel(), y.value

Two key steps make the difference. First, the selector S “inflates” the unknown vector z_free to the full dimension without attempting item assignment into NumPy. Second, NaNs in the known vector are replaced with zeros so x_base remains numeric and compatible with NumPy and CVXPY arithmetic.

Why this matters

In constrained optimization with CVXPY, every symbolic piece must remain in CVXPY’s world until the problem is solved. Crossing that boundary prematurely by writing CVXPY expressions into NumPy arrays leads to brittle code and opaque errors. The selection-matrix pattern preserves a clean separation between numeric data and symbolic variables, making the computational graph explicit and solver-friendly.

Closing advice

Keep CVXPY variables inside expressions, not inside NumPy arrays. When you must position unknowns within a larger structure, use linear algebra to map them via a selector rather than in-place assignment. If something still goes sideways, share the full stack trace and shrink your setup to a minimal reproducible example. It will make the failure mode obvious and the fix straightforward.