https://pytroubles.com/en/posts/id1068-fixing-l-bfgs-b-stalls-on-rounded-non-smooth-objectives-eps-cobyqa-nelder-mead-squaring

Fixing L-BFGS-B Stalls on Rounded, Non-Smooth Objectives: eps, COBYQA, Nelder-Mead, Squaring

How to Optimize Rounded, Non-Smooth Objectives in SciPy: Increase eps, Use COBYQA/Nelder-Mead, or Square

Fixing L-BFGS-B Stalls on Rounded, Non-Smooth Objectives: eps, COBYQA, Nelder-Mead, Squaring

Rounding creates flat plateaus that stall L-BFGS-B on non-smooth objectives. Fix by increasing eps, using COBYQA or Nelder-Mead, or squaring the objective.

2025-10-19T12:00:06+03:00

Rounding inside an objective is a classic way to confuse gradient-based optimizers. The discrete plateaus and kinks it introduces make numerical differentiation unreliable, and methods like L-BFGS-B can report convergence at the starting point even though a better minimum exists.Problem setupConsider an objective that returns values rounded to six decimals. This mirrors real-world pipelines where geometry math is computed with roughly 1e-6 precision.import numpy as np def metric_snap6(u): return np.round(abs(-0.3757609503198057 * (u - 0.2) + 0.03785161636761336), 6) Optimizing it with bounds using SciPy’s default choice (L-BFGS-B when bounds are provided) may stop immediately:from scipy.optimize import minimize res = minimize(metric_snap6, 1, bounds=((0, np.inf),)) CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOLZero iterations and a zero Jacobian at the start indicate that the finite-difference gradient was evaluated on a flat, rounded plateau.What actually goes wrongRounding to six decimals creates large regions where small input changes do not alter the output at all. L-BFGS-B estimates gradients numerically with a small step (its eps), and if both samples land on the same rounded value, the difference looks like zero. The method concludes there is no descent direction and stops. If your objective also behaves like abs(), there is an abrupt derivative change at the kink; that shape can violate Wolfe conditions used by line search in methods like L-BFGS-B, and it also challenges algorithms that fit a quadratic model.Practical fixesThere are three straightforward ways to make progress with low-precision, non-smooth objectives like this one.First, increase the finite-difference step used by L-BFGS-B. A larger eps makes it more likely that the numerical gradient samples points landing on different rounded values.from scipy.optimize import minimize import numpy as np def metric_snap6(u): return np.round(abs(-0.3757609503198057 * (u - 0.2) + 0.03785161636761336), 6) print("L-BFGS-B with larger eps") print(minimize( metric_snap6, 1, bounds=((0, np.inf),), method="L-BFGS-B", options=dict(eps=1e-5) # default is 1e-8 )) Second, switch to derivative-free methods that do not rely on gradients. COBYQA is a strong general-purpose choice for this situation, and Nelder-Mead is another option. In COBYQA you can tune the initial step via initial_tr_radius. Nelder-Mead starts with an initial step of 5% for nonzero parameters and 0.00025 for zero parameters; you can override that by providing an initial_simplex.from scipy.optimize import minimize import numpy as np def metric_snap6(u): return np.round(abs(-0.3757609503198057 * (u - 0.2) + 0.03785161636761336), 6) print("COBYQA (derivative-free)") print(minimize( metric_snap6, 1, bounds=((0, np.inf),), method="COBYQA", options=dict(initial_tr_radius=1.0) )) print("Nelder-Mead (derivative-free)") print(minimize( metric_snap6, 1, bounds=((0, np.inf),), method="Nelder-Mead" )) Third, if your objective is abs()-like—piecewise linear with a sharp minimum—optimize the squared version instead. Squaring smooths the curvature enough to help line-search-based methods satisfy Wolfe conditions and also benefits quadratic model builders like COBYQA. This change can substantially reduce evaluations for the same task.from scipy.optimize import minimize import numpy as np def metric_snap6(u): return np.round(abs(-0.3757609503198057 * (u - 0.2) + 0.03785161636761336), 6) def metric_snap6_sq(u): return metric_snap6(u) ** 2 print("L-BFGS-B on squared objective") print(minimize( metric_snap6_sq, 1, bounds=((0, np.inf),), method="L-BFGS-B", options=dict(eps=1e-5) )) print("COBYQA on squared objective") print(minimize( metric_snap6_sq, 1, bounds=((0, np.inf),), method="COBYQA", options=dict(initial_tr_radius=1.0) )) print("Nelder-Mead on squared objective") print(minimize( metric_snap6_sq, 1, bounds=((0, np.inf),), method="Nelder-Mead" )) On this example, L-BFGS-B converges in 6 evaluations instead of 144, and COBYQA in 16 instead of 29. Nelder-Mead remains unchanged.Why this mattersLow-precision, quantized, or piecewise-linear objectives pop up in real systems, especially where the math is constrained by fixed rounding or discretization. Knowing how gradient estimates behave on flat plateaus, why abs()-shaped kinks cause trouble for line searches, and how trust-region or simplex methods respond lets you pick a method that actually makes progress. It also helps you avoid chasing phantom “convergences” that are artifacts of the rounding rather than real optima.TakeawaysIf an optimizer halts right away with a projected gradient of zero, suspect rounding or non-smoothness. For L-BFGS-B, increase eps to get a usable finite-difference gradient. When gradients are unreliable, try derivative-free strategies like COBYQA or Nelder-Mead, tuning initial_tr_radius or the initial simplex when needed. And if the objective behaves like abs(), minimize its square to improve curvature for line search and quadratic models. These small changes are often enough to turn a stalled run into a reliable search for the true minimum.

L-BFGS-B, rounding in objective, non-smooth optimization, finite-difference eps, SciPy minimize, COBYQA, Nelder-Mead, squared objective, piecewise linear, quantized objective, gradient plateau

2025

2025, Oct 19 12:00

How to Optimize Rounded, Non-Smooth Objectives in SciPy: Increase eps, Use COBYQA/Nelder-Mead, or Square

Rounding creates flat plateaus that stall L-BFGS-B on non-smooth objectives. Fix by increasing eps, using COBYQA or Nelder-Mead, or squaring the objective.

Problem setup

Consider an objective that returns values rounded to six decimals. This mirrors real-world pipelines where geometry math is computed with roughly 1e-6 precision.

import numpy as np
def metric_snap6(u):
    return np.round(abs(-0.3757609503198057 * (u - 0.2) + 0.03785161636761336), 6)

Optimizing it with bounds using SciPy’s default choice (L-BFGS-B when bounds are provided) may stop immediately:

from scipy.optimize import minimize
res = minimize(metric_snap6, 1, bounds=((0, np.inf),))

CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL

Zero iterations and a zero Jacobian at the start indicate that the finite-difference gradient was evaluated on a flat, rounded plateau.

What actually goes wrong

Rounding to six decimals creates large regions where small input changes do not alter the output at all. L-BFGS-B estimates gradients numerically with a small step (its eps), and if both samples land on the same rounded value, the difference looks like zero. The method concludes there is no descent direction and stops. If your objective also behaves like abs(), there is an abrupt derivative change at the kink; that shape can violate Wolfe conditions used by line search in methods like L-BFGS-B, and it also challenges algorithms that fit a quadratic model.

Practical fixes

There are three straightforward ways to make progress with low-precision, non-smooth objectives like this one.

First, increase the finite-difference step used by L-BFGS-B. A larger eps makes it more likely that the numerical gradient samples points landing on different rounded values.

from scipy.optimize import minimize
import numpy as np
def metric_snap6(u):
    return np.round(abs(-0.3757609503198057 * (u - 0.2) + 0.03785161636761336), 6)
print("L-BFGS-B with larger eps")
print(minimize(
    metric_snap6,
    1,
    bounds=((0, np.inf),),
    method="L-BFGS-B",
    options=dict(eps=1e-5)  # default is 1e-8
))

Second, switch to derivative-free methods that do not rely on gradients. COBYQA is a strong general-purpose choice for this situation, and Nelder-Mead is another option. In COBYQA you can tune the initial step via initial_tr_radius. Nelder-Mead starts with an initial step of 5% for nonzero parameters and 0.00025 for zero parameters; you can override that by providing an initial_simplex.

from scipy.optimize import minimize
import numpy as np
def metric_snap6(u):
    return np.round(abs(-0.3757609503198057 * (u - 0.2) + 0.03785161636761336), 6)
print("COBYQA (derivative-free)")
print(minimize(
    metric_snap6,
    1,
    bounds=((0, np.inf),),
    method="COBYQA",
    options=dict(initial_tr_radius=1.0)
))
print("Nelder-Mead (derivative-free)")
print(minimize(
    metric_snap6,
    1,
    bounds=((0, np.inf),),
    method="Nelder-Mead"
))

Third, if your objective is abs()-like—piecewise linear with a sharp minimum—optimize the squared version instead. Squaring smooths the curvature enough to help line-search-based methods satisfy Wolfe conditions and also benefits quadratic model builders like COBYQA. This change can substantially reduce evaluations for the same task.

from scipy.optimize import minimize
import numpy as np
def metric_snap6(u):
    return np.round(abs(-0.3757609503198057 * (u - 0.2) + 0.03785161636761336), 6)
def metric_snap6_sq(u):
    return metric_snap6(u) ** 2
print("L-BFGS-B on squared objective")
print(minimize(
    metric_snap6_sq,
    1,
    bounds=((0, np.inf),),
    method="L-BFGS-B",
    options=dict(eps=1e-5)
))
print("COBYQA on squared objective")
print(minimize(
    metric_snap6_sq,
    1,
    bounds=((0, np.inf),),
    method="COBYQA",
    options=dict(initial_tr_radius=1.0)
))
print("Nelder-Mead on squared objective")
print(minimize(
    metric_snap6_sq,
    1,
    bounds=((0, np.inf),),
    method="Nelder-Mead"
))

On this example, L-BFGS-B converges in 6 evaluations instead of 144, and COBYQA in 16 instead of 29. Nelder-Mead remains unchanged.

Why this matters

Low-precision, quantized, or piecewise-linear objectives pop up in real systems, especially where the math is constrained by fixed rounding or discretization. Knowing how gradient estimates behave on flat plateaus, why abs()-shaped kinks cause trouble for line searches, and how trust-region or simplex methods respond lets you pick a method that actually makes progress. It also helps you avoid chasing phantom “convergences” that are artifacts of the rounding rather than real optima.

Takeaways

If an optimizer halts right away with a projected gradient of zero, suspect rounding or non-smoothness. For L-BFGS-B, increase eps to get a usable finite-difference gradient. When gradients are unreliable, try derivative-free strategies like COBYQA or Nelder-Mead, tuning initial_tr_radius or the initial simplex when needed. And if the objective behaves like abs(), minimize its square to improve curvature for line search and quadratic models. These small changes are often enough to turn a stalled run into a reliable search for the true minimum.

The article is based on a question from StackOverflow by Logan Pageler and an answer by Nick ODell.

python scipy