Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations
    AI Tools

    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

    AwaisBy AwaisMarch 21, 2026No Comments18 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations
    Share
    Facebook Twitter LinkedIn Pinterest Email

    problem, the goal is to find the best (maximum or minimum) value of an objective function by selecting real variables that satisfy a set of equality and inequality constraints.

    A general constrained optimization problem is to select nn real decision variables x0,x1,…,xn−1x_0,x_1,\dots,x_{n-1} from a given feasible region in such a way as to optimize (minimize or maximize) a given objective function

    \[f(x_0,x_1,\dots,x_{n-1}).\]

    We usually let xx denote the vector of nn real decision variables x0,x1,…,xn−1.x_0,x_1,\dots,x_{n-1}. That is, x=(x0,x1,…,xn−1)x=(x_0,x_1,\dots,x_{n-1}) and write the general nonlinear program as:

    \[\begin{aligned}\text{ Maximize }&f(x)\\\text{ Subject to }&g_i(x)\leq b_i&&\qquad(i=0,1,\dots,m-1)\\&x\in\mathbb{R}^n\end{aligned}\]

    where each of the constraint functions g0g_0 through gm−1g_{m-1} is given, and each bib_i is a constant (Bradley et al., 1977).

    This is only one possible way to write the problem. Minimizing f(x)f(x) is equivalent to maximizing −f(x).-f(x). Likewise, an equality constraint h(x)=bh(x)=b can be expressed as the pair of inequalities h(x)≤bh(x)\leq b and −h(x)≤−b.-h(x)\leq-b. Moreover, by adding a slack variable, any inequality can be converted into an equality constraint (Bradley et al., 1977).

    These types of problems appear across many application areas, for example, in business settings where a company aims to maximize profit or minimize costs while operating on resources or funding constraints (Cherry, 2016).

    If f(x)f(x) is a linear function and the constraints are linear, then the problem is called a linear programming problem (LP) (Cherry, 2016).

    “The problem is called a nonlinear programming problem (NLP) if the objective function is nonlinear and/or the feasible region is determined by nonlinear constraints.” (Bradley et al., 1977, p. 410)

    The assumptions and approximations used in linear programming can sometimes produce a suitable model for the decision variable ranges of interest. In other cases, however, nonlinear behavior in the objective function and/or the constraints is essential to formulate the application as a mathematical program accurately (Bradley et al., 1977).

    Nonlinear programming refers to methods for solving an NLP. Although many mature solvers exist for LPs, NLPs, especially those involving higher-order nonlinearities, are typically more difficult to solve (Cherry, 2016).

    Challenging nonlinear programming problems arise in areas such as electronic circuit design, financial portfolio optimization, gas network optimization, and chemical process design (Cherry, 2016).

    One way to tackle nonlinear programming problems is to linearize them and use an LP solver to obtain a good approximation. One would like to do that for several reasons. For instance, linear models are usually faster to solve and can be more numerically stable.

    To move from theory to a working Python model, we will walk through the following sections:

    This text assumes some familiarity with mathematical programming. After defining Separable Functions, we present a separable nonlinear maximization Example and outline an approach based on piecewise linear (PWL) approximations of the nonlinear objective function.

    Next, we define Special Ordered Sets of Type 2 and explain how they support the numerical formulation.

    We then introduce the theoretical background in On Convex and Concave Functions, providing tools used throughout the remainder of this work on nonlinear programming (NLP).

    Finally, we present a Python Implementation procedure that uses Gurobipy and PWL approximations of the nonlinear objective function, enabling LP/MIP solvers to obtain useful approximations for fairly large NLPs.


    Separable Functions

    This article’s solution procedure is for separable programs. Separable programs are optimization problems of the form:

    \[\begin{aligned}\text{ Maximize }&\sum\limits_{j=0}^{n-1}f_j(x_j)\\\text{ Subject to }&\sum\limits_{j=0}^{n-1} g_{ij}(x_j)\leq 0\qquad(i=0,1,\dots,m-1)\\&x\in\mathbb{R}^n\end{aligned}\]

    where each of the functions fjf_j and gijg_{ij} are known. These problems are called separable because the decision variables appear separately: one in each constraint function gijg_{ij} and one in each objective function fjf_j (Bradley et al., 1977).


    Example

    Consider the following problem from Bradley et al. (1977) that arises in portfolio selection:

    \[\begin{aligned}\text{ Maximize }&f(x)=20x_0+16x_1-2x_0^2-x_1^2-(x_0+x_1)^2\\\text{ Subject to }&x_0+x_1\leq5\\&x_0\geq0,x_1\geq0.\end{aligned}\]

    Objective function (Animation by the author).
    Objective function over the feasible region (Animation by the author).

    As stated, the problem is not separable because of the term (x0+x1)2(x_0+x_1)^2 in the objective function. However, nonseparable problems can frequently be reduced to a separable form using various formulation tricks. In particular, by letting x2=x0+x1x_2=x_0+x_1 we can re-express the problem above in separable form as:

    \[\begin{aligned}\text{ Maximize }&f(x)=20x_0+16x_1-2x_0^2-x_1^2-x_2^2\\\text{ Subject to }&x_0+x_1\leq5\\&x_0+x_1-x_2=0\\&x_0\geq0,x_1\geq0,x_2\geq 0.\end{aligned}\]

    Clearly, the linear constraints are separable, and the objective function can now be written as f(x)=f0(x0)+f1(x1)+f2(x2),f(x)=f_0(x_0)+f_1(x_1)+f_2(x_2), where

    \[\begin{aligned}f_0(x_0)&=20x_0-2x_0^2\\f_1(x_1)&=16x_1-x_1^2\end{aligned}\]

    and

    \[f_2(x_2)=-x_2^2.\]

    Nonlinear functions (Image by the author).

    To form the approximation problem, we approximate each nonlinear term fj(xj)f_j(x_j) by a piecewise linear function fj^(xj)\hat{f_j}(x_j) by using a specified number of breakpoints.

    Each PWL function fj^(xj)\hat{f_j}(x_j) is defined over an interval [l,u][l,u] and consists of line segments whose endpoints lie on the nonlinear function fj(xj),f_j(x_j), starting at (l,fj(l))(l,f_j(l)) and ending at (u,fj(u)).(u,f_j(u)). We represent the PWL functions using the following parametrization. One can write any l≤xj≤ul\leq x_j\leq u as:

    \[x_j=\sum\limits_{i=0}^{r-1}\theta_j^i a_i,\quad \hat f_j(x_j)=\sum\limits_{i=0}^{r-1}\theta_j^if_j(a_i),\quad\sum\limits_{i=0}^{r-1}\theta_j^i=1,\quad\theta_j=(\theta_j^0,\theta_j^1,\dots\theta_j^{r-1})\in\mathbb{R}_{\geq0}^r\]

    for each j=0,1,2,j=0,1,2, where the PWL functions are specified by the points {(ai,fj(ai)}\{(a_i,f_j(a_i)\} for i=0,1,…,r−1i=0,1,\dots,r-1 (Cherry, 2016). Here, we use 4 uniformly distributed breakpoints to approximate each fj(xj)f_j(x_j) Also, since x0+x1≤5,x_0+x_1\leq5, x0+x1=x2,x_0+x_1=x_2, and x0,x1,x2≥0x_0,x_1,x_2\geq0 we have that

    \[0\leq x_0\leq5,\qquad0\leq x_1\leq 5,\quad\text{ and }\quad0\leq x_2\leq5.\]

    And so, our approximations need not extend beyond these variable bounds. Therefore the PWL function fj^(xj)\hat{f_j}(x_j) will be defined by the points (0,fj(0)),(0,f_j(0)), (5/3,fj(5/3)),(5/3,f_j(5/3)), (10/3,fj(10/3))(10/3,f_j(10/3)) and (5,fj(5))(5,f_j(5)) for j=0,1,2.j=0,1,2.

    Any x0∈[0,5]∩ℝx_0\in[0,5]\cap\mathbb{R} can be written as

    \[\begin{aligned}x_0&=\theta_0^0\cdot0+\theta_0^1\cdot\frac{5}{3}+\theta_0^2\cdot\frac{10}{3}+\theta_0^3\cdot5\\&=\theta_0^1\cdot\frac{5}{3}+\theta_0^2\cdot\frac{10}{3}+\theta_0^3\cdot 5\text{ s.t. }\sum\limits_{i=0}^3\theta_0^i=1\end{aligned}\]

    and f0(x0)f_0(x_0) can be approximated as

    \[\begin{aligned}\hat f_0(x_0)&=\theta_0^0f_0(0)+\theta_0^1f_0\left(\frac{5}{3}\right)+\theta_0^2f_0\left(\frac{10}{3}\right)+\theta_0^3f_0(5)\\&=\theta_0^1\cdot\frac{250}{9}+\theta_0^2\cdot\frac{400}{9}+\theta_0^3\cdot50.\end{aligned}\]

    The same reasoning follows for x1x_1 and x2x_2 For example, evaluating the approximation at x0=1.5x_0=1.5 gives:

    \[\begin{aligned}\hat f_0(1.5)&=\frac{1}{10}f_0(0)+\frac{9}{10}f_0\left(\frac{5}{3}\right)\\&=\frac{1}{10}\cdot 0+\frac{9}{10}\cdot\frac{250}{9}\\&=25\end{aligned}\]

    since

    \[1.5=\frac{1}{10}\cdot0+\frac{9}{10}\cdot\frac{5}{3}.\]

    PWL approximation curves (Image by the author).

    Such weighted sums are called convex combinations, i.e., linear combinations of points with non-negative coefficients that sum to 1. Now, since f0(x0),f_0(x_0), f1(x1),f_1(x_1), and f2(x2)f_2(x_2) are concave functions, one may ignore additional restrictions, and the problem can be solved as an LP (Bradley et al., 1977). However, in general, such a formulation does not suffice. In particular, we still need to enforce the adjacency condition: At most two θji\theta_j^i weights are positive, and if two weights are positive, then they are adjacent, i.e., of the form θji\theta_j^i and θji+1.\theta_j^{i+1}. A similar restriction applies to each approximation. For instance, if the weights θ00=2/5\theta_0^0=2/5 and θ02=3/5\theta_0^2=3/5 then the approximation gives

    \[\begin{aligned}\hat f_0(x_0)&=\frac{2}{5}\cdot 0+\frac{3}{5}\cdot\frac{400}{9}=\frac{80}{3}\\x_0&=\frac{2}{5}\cdot0+\frac{3}{5}\cdot\frac{10}{3}=2.\end{aligned}\]

    In contrast, at x0=2,x_0=2, the approximation curve gives

    \[\begin{aligned}\hat f_0(x_0)&=\frac{4}{5}\cdot\frac{250}{9}+\frac{1}{5}\cdot\frac{400}{9}=\frac{280}{9}\\x_0&=\frac{4}{5}\cdot\frac{5}{3}+\frac{1}{5}\cdot\frac{10}{3}=2.\end{aligned}\]

    Need for adjacency condition (Image by the author).

    One can enforce the adjacency condition by introducing additional binary variables into the model, as formulated in Cherry (2016). However, we leverage Gurobipy‘s SOS (Special Ordered Set) constraint object.

    The complete program is as follows

    \[\begin{alignat}{2}\text{ Maximize }&\hat f_0(x_0)+\hat f_1(x_1)+\hat f_2(x_2)\\\text{ Subject to }&x_0=\sum\limits_{i=0}^{r-1}\theta_0^ia_i\\&x_1=\sum\limits_{i=0}^{r-1}\theta_1^ia_i\\&x_2=\sum\limits_{i=0}^{r-1}\theta_2^ia_i\\&\hat f_0=\sum\limits_{i=0}^{r-1}\theta_0^if_0(a_i)\\&\hat f_1=\sum\limits_{i=0}^{r-1}\theta_1^if_1(a_i)\\&\hat f_2=\sum\limits_{i=0}^{r-1}\theta_2^if_2(a_i)\\&\sum\limits_{i=0}^{r-1}\theta_j^i=1&&\qquad j=0,1,2\\&\{\theta_j^0,\theta_j^1,\dots\theta_j^{r-1}\}\text{ SOS type 2 }&&\qquad j=0,1,2\\&x_0,x_1,x_2\in[0,5]\cap\mathbb{R}\\&\hat f_0\in[0,50]\cap\mathbb{R}\\&\hat f_1\in[0,55]\cap\mathbb{R}\\&\hat f_2\in [-25,0]\cap\mathbb{R}\\&\theta_j^i\in[0,1]\cap\mathbb{R}&&\qquad j=0,1,2,\;i=0,1,\dots,{r-1}.\end{alignat}\]


    Special Ordered Sets of Type 2

    “A set of variables {x0,x1,…,xn−1}\{x_0,x_1,\dots,x_{n-1}\} is a special ordered set of type 2 (SOS type 2) if xixj=0x_ix_j=0 whenever |i−j|≥2,|i-j|\geq2, i.e., at most two variables in the set can be nonzero, and if two variables are nonzero they must be adjacent in the set.” (de Farias et al., 2000, p. 1)

    In our formulation, the adjacency condition on the θji\theta_j^i variables matches exactly a SOS type 2 condition: at most two θji\theta_j^i can be positive, and they must correspond to adjacent breakpoints. Using SOS type 2 constraints lets us ensure adjacency directly in Gurobi, which applies specialized branching strategies that automatically capture the SOS structure, instead of introducing extra binary variables by hand.


    On Convex and Concave Functions

    For the following discussion, we take a general separable constrained maximization problem instance, as previously defined in the Introduction section:

    \[\begin{aligned}\text{ Maximize }&\sum\limits_{j=0}^{n-1}f_j(x_j)\\\text{ Subject to }&\sum\limits_{j=0}^{n-1} g_{ij}(x_j)\leq 0\qquad(i=0,1,\dots,m-1)\\&x\in\mathbb{R}^n.\end{aligned}\]

    where each of the functions fif_i and gijg_{ij} are known.

    Throughout, the term piecewise-linear (PWL) approximation refers to the secant interpolant obtained by connecting ordered breakpoints pairs (ak,f(ak))(a_k,f(a_k)) with line segments.

    Convex and concave are among the most essential functional forms in mathematical optimization. Formally, a function f(x)f(x) is called convex if, for every yy and zz and every 0≤λ≤1,0\leq\lambda\leq1,

    \[f[\lambda y+(1-\lambda)z]\leq\lambda f(y)+(1-\lambda)f(z).\]

    This definition implies that the sum of convex functions is convex, and nonnegative multiples of convex functions are also convex. Concave functions are simply the negative of convex functions, for which the definition above follows except for the reversed direction of the inequality. Of course, some functions are neither convex nor concave.

    Convex and concave functions (Eric W. Weisstein on MathWorld)

    It is easy to see that linear functions are both convex and concave. This property is essential since it allows us to optimize (maximize or minimize) linear objective functions using computationally efficient techniques, such as the simplex method in linear programming (Bradley et al., 1977).

    Additionally, a single-variable differentiable function f(x)f(x) is convex on an interval exactly when its derivative f′(x)f^\prime(x) is nondecreasing — meaning the slope does not go down. Likewise, differentiable f(x)f(x) is concave on an interval exactly when f′(x)f^\prime(x) is nonincreasing — meaning the slope does not go up.

    From the definition above, one can trivially conclude that PWL approximations overestimate convex functions since every line segment joining two points on its graph does not lie below the graph at any point. Similarly, PWL approximations underestimate concave functions.

    Nonlinear Objective Function

    This notion helps us explain why we can omit the SOS type 2 constraints from a problem such as the one in the Example section. By concavity, the PWL approximation curve lies above the segment joining any two non-adjacent points. Therefore, the maximization will select the approximation curve with only adjacent weights. A similar argument applies if three or more weights are positive (Bradley et al., 1977). Formally, suppose that each fj(xj)f_j(x_j) is a concave function and each known gijg_{ij} is a linear function in x.x. Then we can apply the same procedure as in the Example section — We let breakpoints satisfy

    \[l= a_0\lt a_1\lt \dots\lt a_{r-1}= u,\qquad r\geq 2.\]

    For real numbers l≤xj≤u.l\leq x_j\leq u. Also, let θj∈ℝ+r\theta_j\in\mathbb{R}_+^r such that,

    \[\sum\limits_{i=0}^{r-1}\theta_j^i=1\]

    and assign

    \[x_j=\sum\limits_{i=0}^{r-1}\theta_j^i\cdot a_i,\qquad\hat f_j=\sum\limits_{i=0}^{r-1}\theta_j^if_j(a_i)\]

    for each j=0,1,…,n−1.j=0,1,\dots,n-1. And so, the transformed LP is as follows

    \[\begin{alignat}{2}\text{ Maximize }&\sum\limits_{j=0}^{n-1}\hat f_j\\\text{ Subject to }&\sum\limits_{j=0}^{n-1} g_{ij}(x_j)\leq 0&&\qquad(i=0,1,\dots,m-1)\\&x_j=\sum\limits_{i=0}^{r-1}\theta_j^i\cdot a_i&&\qquad(j=0,1,\dots,n-1)\\&\hat f_j=\sum\limits_{i=0}^{r-1}\theta_j^if_j(a_i)&&\qquad(j=0,1,\dots,n-1)\\&\sum_{i=0}^{r-1}\theta_j^i=1&&\qquad(j=0,1,\dots,n-1)\\&\theta_j\in\mathbb{R}_{\geq 0}^r&&\qquad(j=0,1,\dots,n-1)\\&x\in\mathbb{R}^n.\end{alignat}\]

    Now, let pj(xj)p_j(x_j) be the PWL curve that goes through the points (ai,fj(ai))(a_i,f_j(a_i)) i.e., for each xj∈[ak,ak+1],x_j\in[a_k,a_{k+1}], j=0,1,…,n−1:j=0,1,\dots,n-1:

    \[p_j(x_j)=\frac{a_{k+1}-x_j}{a_{k+1}-a_k}f_j(a_k)+\frac{x_j-a_k}{a_{k+1}-a_k}f_j(a_{k+1})\]

    fj(xj)f_j(x_j) concave implies that its secant slopes are non-increasing, and since pj(xj)p_j(x_j) is constructed by joining ordered breakpoints on the graph of fj(xj),f_j(x_j), we have that pj(xj)p_j(x_j) is also concave. And so, by Jensen’s inequality,

    \[p_j(x_j)=p_j\left(\sum\limits_{i=0}^{r-1}\theta_j^i\cdot a_i\right)\geq\sum\limits_{i=0}^{r-1}\theta_j^i\cdot p_j(a_i)=\sum\limits_{i=0}^{r-1}\theta_j^if_j(a_i)=\hat f_j.\]

    Moreover, for all xjx_j in any feasible solution ((x0,f0^,θ0),(x1,f1^,θ1),…,(xn−1,f^n−1,θn−1)),((x_0,\hat{f_0},\theta_0),(x_1,\hat{f_1},\theta_1),\dots,(x_{n-1},\hat{f}_{n-1},\theta_{n-1})), one can choose kk such that xj∈[ak,ak+1]x_j\in[a_k,a_{k+1}] and define the vector θj^\hat{\theta_j} by

    \[\hat\theta_j^k=\frac{a_{k+1}-x_j}{a_{k+1}-a_k},\qquad \hat\theta_j^{k+1}=\frac{x_j-a_k}{a_{k+1}-a_k},\qquad\hat\theta_j^i=0,\;(i\notin \{k,k+1\})\qquad\]

    with θ^jk,θ^jk+1≥0\hat\theta_j^k,\hat\theta_j^{k+1}\geq 0 and θ^jk+θ^jk+1=1.\hat\theta_j^k+\hat\theta_j^{k+1}=1. Then θ^jk⋅ak+θ^jk+1⋅ak+1=xj\hat\theta_j^k\cdot a_k+\hat\theta_j^{k+1}\cdot a_{k+1}=x_j and gives

    \[\hat f_j^\text{new}=\hat\theta_j^k f_j(a_k)+\hat\theta_j^{k+1}f_j(a_{k+1})=p_j(x_j)\geq\hat f_j.\]

    Because the constraints depend on θj\theta_j only through the induced value xj,x_j, replacing each θj\theta_j by θ^j\hat\theta_j preserves feasibility (because it leaves each xjx_j unchanged) and it weakly improves the objective function (because it increases each f^j\hat f_j to pj(xj)p_j(x_j)). Therefore, the SOS type 2 constraints that enforce the adjacency condition do not change the optimal value in separable maximization problems with concave objective functions. A similar argument shows that the SOS type 2 constraints do not change the optimal value in separable minimization problems with convex objective functions.

    Note, however, that this shows that there exists an optimal solution in which only weights corresponding to adjacent breakpoints are positive. It does not guarantee that all optimal solutions of the LP satisfy adjacency. Therefore, one may still include SOS type 2 constraints to enforce a consistent representation.

    In practice, solving models with concave objective functions via PWL approximations can yield an objective value that underestimates the true optimal value. Conversely, solving models with convex objective functions via PWL approximations yield an objective value that overestimates the true optimal value.

    However, although these approximations can distort the objective value, in models where the original linear constraints remain unchanged, feasibility is not affected by the PWL approximations. Any variable assignments found by solving the model through PWL approximations satisfy the same linear constraints and therefore lie in the original feasible region. Consequently, one can evaluate the original nonlinear objective f(x)f(x) at the returned solution to obtain the objective value, even though this value need not equal the true nonlinear optimum.

    Nonlinear Constraints

    What requires different approaches are PWL approximations of nonlinear constraints, since these approximations can change the original feasible region. We define the feasible region/set of the problem, like the one above, by:

    \[F=\bigcap\limits_{i=0}^{m-1}\{x\in\mathbb{R}^n:G_i(x)\leq0\},\]

    where

    \[G_i(x):=\sum\limits_{j=0}^{n-1}g_{ij}(x_j).\]

    Now, suppose (for notational simplicity) that all Gi(x)G_i(x) are nonlinear functions. Approximate each univariate gij(xj)g_{ij}(x_j) by a PWL function g^ij(xj)\hat g_{ij}(x_j) and define:

    \[\hat G_i(x):=\sum\limits_{j=0}^{n-1}\hat g_{ij}(x_j).\]

    Then

    \[\hat F=\bigcap\limits_{i=0}^{m-1}\{x\in\mathbb{R}^n:\hat G_i(x)\leq0\},\]

    is the feasible set of the transformed LP that uses PWL approximations. To avoid truly infeasible solutions, we want

    \[\hat F\subseteq F.\]

    If, instead, F⊆F^F\subseteq\hat F then there may exist x∈F^∖Fx\in \hat F\setminus F meaning the model that uses PWL approximations could accept points that violate the original nonlinear constraints.

    Feasible sets (Image by the author).

    Given the way we construct the PWL approximations, a sufficient condition for F^⊆F\hat F\subseteq F is that for constraints of the form Gi(x)≤0G_i(x)\leq0 each gij(xj)g_{ij}(x_j) is convex and for constraints of the form Gi(x)≥0G_i(x)\geq0 each gij(xj)g_{ij}(x_j) is concave.

    To see why this holds, take any x∈F^.x\in\hat F. First, consider constraints of the form Gi(x)≤0G_i(x)\leq0 where each gij(xj)g_{ij}(x_j) is convex. Because PWL approximations overestimate convex functions, for any fixed i=0,1,…,m−1,i=0,1,\dots,m-1, we have:

    \[(\forall j,\;\hat g_{ij}(x_j)\geq g_{ij}(x_j))\rightarrow\sum\limits_{j=0}^{n-1}\hat g_{ij}(x_j)\geq\sum\limits_{j=0}^{n-1}g_{ij}(x_j)\rightarrow\hat G_i(x)\geq G_i(x)\]

    Since x∈F^,x\in\hat F, it satisfies G^i(x)≤0.\hat G_i(x)\leq0. Together with G^i(x)≥Gi(x),\hat G_i(x)\geq G_i(x), this implies Gi(x)≤0.G_i(x)\leq0.

    Next, consider constraints of the form Gi(x)≥0G_i(x)\geq0 where each gij(xj)g_{ij}(x_j) is concave. Because PWL approximations underestimate concave functions, for any fixed i=0,1,…,m−1,i=0,1,\dots,m-1, we have:

    \[(\forall j,\;\hat g_{ij}(x_j)\leq g_{ij}(x_j))\rightarrow\sum\limits_{j=0}^{n-1}\hat g_{ij}(x_j)\leq\sum\limits_{j=0}^{n-1}g_{ij}(x_j)\rightarrow\hat G_i(x)\leq G_i(x).\]

    Again, since x∈F^,x\in\hat F, it satisfies G^i(x)≥0.\hat G_i(x)\geq0. Combined with G^i(x)≤Gi(x),\hat G_i(x)\leq G_i(x), this implies Gi(x)≥0.G_i(x)\geq0.

    This motivates the notion of a convex set: A set CC is convex if, for any two points x,y∈Cx,y\in C the entire line segment connecting them lies within C.C. Formally, CC is convex if for all x,y∈Cx,y\in C and any λ∈[0,1]\lambda\in[0,1] the point λx+(1−λ)y\lambda x+(1-\lambda)y is also in C.C.

    Convex and concave functions (Eric W. Weisstein on MathWorld)

    In particular, if Gi:ℝn→ℝG_i:\mathbb{R}^n\to\mathbb{R} is convex, then

    \[S_i:=\{x\in\mathbb{R}^n:G_i(x)\leq b\}\]

    is convex for any b∈ℝ.b\in\mathbb{R}. Proof: Take any x1,x2∈S,x_1,x_2\in S, then Gi(x1)≤bG_i(x_1)\leq b and Gi(x2)≤b.G_i(x_2)\leq b. Since Gi(x)G_i(x) is convex, for any λ∈[0,1]\lambda\in[0,1] we have that

    \[G_i(\lambda x_1+(1-\lambda)x_2)\leq\lambda G_i(x_1)+(1-\lambda)G_i(x_2)\leq\lambda b+(1-\lambda)b=b.\]

    Hence, λx1+(1−λ)x2∈S,\lambda x_1+(1-\lambda) x_2\in S, and so, SS is convex. A similar argument applies to show that

    \[T_i:=\{x\in\mathbb{R}^n:G_i(x)\geq b\}\]

    is convex for any b∈ℝb\in\mathbb{R} if Gi:ℝn→ℝG_i:\mathbb{R}^n\to\mathbb{R} is concave.

    And since one could easily show that the intersection of any collection of convex sets is also convex,

    “for a convex feasible set we want convex functions for less-than-or-equal-to constraints and concave functions for greater-than-or-equal to constraints.” (Bradley et al., 1977, p.418)

    Nonlinear optimization problems in which the goal is to minimize a convex objective (or maximize a concave one) over a convex set of constraints are called convex nonlinear problems. These problems are generally easier to tackle than nonconvex ones.

    Indeed, if F⊆F^F\subseteq\hat F then variable assignments may lead to constraint violations. In these cases, one would need to rely on other techniques, such as the simple modification MAP to the Frank-Wolfe algorithm provided in Bradley et al. (1977), and/or introduce tolerances that, to some degree, allow constraint violations.


    Python Implementation

    Before presenting the code, it is useful to consider how our modeling approach will scale as the problem size increases. For a separable program with nn decision variables and rr breakpoints per variable, the model introduces n⋅rn\cdot r continuous variables (for the convex combinations) and rr SOS type 2 constraints. This means that both the number of variables and constraints grow linearly with nn and r,r, but their product can quickly lead to large models when either parameter is increased.

    As previously mentioned, we will use Gurobipy to numerically solve the program above. Generally, the procedure is as follows:

    1. Choose bounds [l,u][l,u]
    2. Choose breakpoints aia_i
    3. Add θ,\theta, convex-combination constraints
    4. Add SOS type 2
    5. Solve
    6. Evaluate the nonlinear objective at the returned xx
    7. Refine breakpoints if needed

    After importing the necessary libraries and dependencies, we declare and instantiate our nonlinear functions f0,f_0, f1,f_1, and f2:f_2:

    # Imports
    import gurobipy as grb
    import numpy as np
    
    # Nonlinear functions
    f0=lambda x0: 20.0*x0-2.0*x0**2
    f1=lambda x1: 16.0*x1-x1**2
    f2=lambda x2: -x2**2

    Next, we choose the number of breakpoints. To construct the PWL approximations, we need to distribute them across the interval [0,5]∩ℝ.[0,5]\cap\mathbb{R}. Many strategies exist to do this. For instance, one could cluster more points near the ends or even rely on adaptive procedures as in Cherry (2016). For simplicity, we stick to uniform sampling:

    lb,ub=0.0,5.0
    r=4  # Number of breakpoints
    a_i=np.linspace(start=lb,stop=ub,num=r)  # Distribute breakpoints uniformly

    Then, we can compute the value of the functions fj(xj)f_j(x_j) at the chosen breakpoints to approximate the decision variables:

    # Compute function values at breakpoints
    f0_hat_r=f0(a_i)
    f1_hat_r=f1(a_i)
    f2_hat_r=f2(a_i)

    The next step is to declare and instantiate a Gurobipy optimization model:

    model=gp.Model()

    After creating of our model, we need to add the decision variables. Gurobipy provides methods to add multiple decision variables to a model at once, and we leverage this feature to declare and instantiate some of our decision variables, namely the coefficients of the PWL approximations. Note that all decision variables in our model are continuous and can be assigned to any real value within the provided bounds. Such models can be solved efficiently using modern software packages such as Gurobi.

    # Decision variables
    x0=model.addVar(lb=0.0,ub=5.0,name='x0',vtype=GRB.CONTINUOUS)
    x1=model.addVar(lb=0.0,ub=5.0,name='x1',vtype=GRB.CONTINUOUS)
    x2=model.addVar(lb=0.0,ub=5.0,name='x2',vtype=GRB.CONTINUOUS)
    
    f0_hat=model.addVar(lb=0.0,ub=50.0,name='f0_hat',vtype=GRB.CONTINUOUS)
    f1_hat=model.addVar(lb=0.0,ub=55.0,name='f1_hat',vtype=GRB.CONTINUOUS)
    f2_hat=model.addVar(lb=-25.0,ub=0.0,name='f2_hat',vtype=GRB.CONTINUOUS)
    
    theta=model.addVars(range(0,3),range(0,r),lb=0.0,ub=1.0,name='theta',vtype=GRB.CONTINUOUS)

    Additionally, we need to add constraints that limit the values our decision variables may take. Firstly, we add the constraints that ensure the correct decision of the variables that yield the approximations of the nonlinear functions:

    # Constraints
    model.addConstr(x0==gp.quicksum(theta[0,i]*a_i[i] for i in range(0,r)))
    model.addConstr(x1==gp.quicksum(theta[1,i]*a_i[i] for i in range(0,r)))
    model.addConstr(x2==gp.quicksum(theta[2,i]*a_i[i] for i in range(0,r)))
    
    model.addConstr(f0_hat==gp.quicksum(theta[0,i]*f0_hat_r[i] for i in range(0,r)))
    model.addConstr(f1_hat==gp.quicksum(theta[1,i]*f1_hat_r[i] for i in range(0,r)))
    model.addConstr(f2_hat==gp.quicksum(theta[2,i]*f2_hat_r[i] for i in range(0,r)))
    
    model.addConstr(gp.quicksum(theta[0,i] for i in range(0,r))==1)
    model.addConstr(gp.quicksum(theta[1,i] for i in range(0,r))==1)
    model.addConstr(gp.quicksum(theta[2,i] for i in range(0,r))==1)

    Secondly, we add the constraints that define the original problem, namely those restrictions in the nonlinear formulation:

    model.addConstr(x0+x1<=5.0)
    model.addConstr(x0+x1-x2==0.0)

    And finally, we add the SOS Type 2 constraints to ensure the adjacency condition:

    # Special ordered sets
    model.addSOS(GRB.SOS_TYPE2,[theta[0,i] for i in range(0,r)])
    model.addSOS(GRB.SOS_TYPE2,[theta[1,i] for i in range(0,r)])
    model.addSOS(GRB.SOS_TYPE2,[theta[2,i] for i in range(0,r)])

    All that remains is to define the objective function, which we would like to maximize:

    model.setObjective(f0_hat+f1_hat+f2_hat,GRB.MAXIMIZE)  # Objective function

    The model is now complete and we can use the Gurobi solver to retrieve the optimal value and optimal variable assignments. The code to do this is shown below:

    model.optimize()  # Optimize!
    
    obj_val=model.ObjVal
    
    res={}
    
    all_vars=model.getVars()
    values=model.getAttr('X',all_vars)
    names=model.getAttr('VarName',all_vars)
    
    # Retrieve values of variables after optimizing
    for name,val in zip(names,values):
        res[name]=val

    After executing the code above and printing attributes to the screen, the output is as follows:

    - Status: 2
    - Number of breakpoints: 4
    - Optimal objective value: 45.00
    - Optimal x0, f0(x0) value (PWL approximation): 1.67, 27.78
    - Optimal x1, f1(x1) value (PWL approximation): 3.33, 42.22
    - Optimal x2, f2(x2) value (PWL approximation): 5.00, -25.00
    PWL approximation curves (3 segments) with optimal values (Image by the author).

    The difference from our computed solution to the optimal value of 139/3 is 4/3. Now, to increase accuracy and obtain a better solution, one could add more breakpoints, consequently increasing the number of segments used to approximate the nonlinear functions. For instance, the output below shows how increasing the number of breakpoints to 16 leads to an approximation with practically no error to the true optimal value:

    - Status: 2
    - Number of breakpoints: 16
    - Optimal objective value: 46.33
    - Optimal x0, f0(x0) value (PWL approximation): 2.33, 35.78
    - Optimal x1, f1(x1) value (PWL approximation): 2.67, 35.56
    - Optimal x2, f2(x2) value (PWL approximation): 5.00, -25.00
    PWL approximation curves (15 segments) with optimal values (Image by the author).
    Objective function over the feasible region with optimal value using PWL approximation curves consisting of 15 segments (Animation by the author).

    You can find the complete Python code in this GitHub repository: link.


    Further Readings

    For a more advanced, robust algorithm that uses PWL functions to approximate the nonlinear objective function, Cherry (2016) is a valuable reference. It presents a procedure that begins with coarse approximations, which are refined iteratively using the optimal solution to the LP relaxation from the previous step.

    For a deeper understanding of nonlinear programming, including methods, applied examples, and an explanation of why nonlinear problems are intrinsically more difficult to solve, the reader may refer to Nonlinear Programming: Applied Mathematical Programming (Bradley et al., 1977).

    Finally, de Farias et al. (2000) present computational experiments that use SOS constraints within a branch-and-cut scheme to solve a generalized assignment problem arising in fiber-optic cable production scheduling.


    Conclusion

    Nonlinear programming is a relevant area in numerical optimization — with applications ranging from financial portfolio optimization to chemical process control. This article presented a procedure for tackling separable nonlinear programs by replacing nonlinear terms with PWL approximations and solving the resulting model using LP/MIP solvers. We also provided theoretical background on convex and concave functions and explained how these notions relate to nonlinear programming. With these elements, a reader should be able to model and solve LP/MIP-compatible approximations of many nonlinear programming instances without relying on dedicated nonlinear solvers.

    Thank you for reading!


    References

    Cherry, A., 2016. Piecewise Linear Approximation for Nonlinear Programming Problems. A dissertation. Texas Tech University. Available at: https://ttu-ir.tdl.org/server/api/core/bitstreams/2ffce0e6-87e9-4e4c-b41a-60ea670c5a13/content (Accessed: 27 January 2026).

    Bradley, S. P., Hax, A. C. & Magnanti, T. L., 1977. Nonlinear Programming. In: Applied Mathematical Programming. Reading, MA: Addison-Wesley Publishing Company, pp. 410–464. Available at: https://web.mit.edu/15.053/www/AMP-Chapter-13.pdf (Accessed: 27 January 2026).

    de Farias, I. R., Johnson, E. L. & Nemhauser, G. L., 2000. A generalized assignment problem with special ordered sets: a polyhedral approach. Mathematical Programming, 89(1), pp. 187–203. Available at: https://doi.org/10.1007/PL00011392 (Accessed: 27 January 2026).


    Connect With Me

    Approximations Constrained Gentle introduction Linear NonLinear Optimization Piecewise
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

    March 21, 2026

    Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

    March 21, 2026

    How to Measure AI Value

    March 20, 2026

    What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

    March 20, 2026

    The Math That’s Killing Your AI Agent

    March 20, 2026

    Agent Control Protocol: Admission Control for Agent Actions

    March 20, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

    March 21, 2026

    problem, the goal is to find the best (maximum or minimum) value of an objective…

    23 Radish Recipes for Salads, Pickles, and More

    March 21, 2026

    Bots could overtake human web usage by 2027

    March 21, 2026

    How to create a Zoom meeting link and share it

    March 21, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    The Best New Cookbooks of Spring 2026

    March 21, 2026

    Google Business Profile tests AI-generated replies to reviews

    March 21, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.