这是一篇来自美国的关于大规模优化的作业代写

 

Problem 1

Consider an iteration of the projected gradient method (PGM)

x+ = projX (x γf(x)) = x γg(x)

where x ∈ X and g(x) := (x x+)is the gradient mapping vector (see Page 4 of Lecture 4). We will assume that f ∈ C 1L(Rn) and X ⊆ Rn is a closed and convex set.

(a) Show that (y projX (y))T(z projX (y)) 0, z ∈ X , y Rn.

Hint: Use optimality conditions for constrained convex optimization.

(b) Using (a), show that f(x)T(x+ z) g(x)T(x+ z), z ∈ X .

(c) Using (b), show that for any 0 γ 1/L, we have

f(x+) f(x) γ

2k g(x)k 2

(d) Using (c), show that g(xt) 0 as t +. How can this result be interpreted for f ∈  F1L(Rn)?

Problem 2

(a) Show the following inequality |xTy| ≤ kxkk yk , x, y Rn, where k · kis the dual norm of k · k over Rn.

(b) Characterize the subdifferential of the ` -norm: f(x) = k xk = max{|x1|, . . . , |xn|}.

Hint: Use of the fact that ` 1 and ` norms are duals of each other.

Problem 3

Let f(x) = k xk be a norm over Rn and ∂f(x) its subdifferential. Consider the set Z(x) := {z Rn : zTx = k xk , k zk 1}, where k · kis the dual norm of k · k.

(a) Show that Z(x) ∂f(x).

Hint: Use the subgradient inequality and Problem 2(a).

(b) Show that ∂f(x) ⊆ Z(x).

Hint: Use maxz{xTz − kzk} = ✶k·k1(x), x Rn, where X is the indicator function of X ⊆ Rn.

(c) Using (a) and (b) conclude that

k xk = arg maxk zk 1{ zTx }.

Problem 4

We consider the minimization of the least-squares function

f(x) =12k Ax bk 22, where A Rm×n and b Rm. In this problem, we will work with the dataset stored in dataset.mat. This fifile, along with the initial script, is stored within the archive assignment1.zip.

(a) Comment on smoothness and convexity of f. What is the value of the Lipschitz constant of f?

(b) Download and run gm.m, providing the initial script for implementing the gradient method (GM). Complete the script to implement GM using the step-size γ = stepSize = 1/L, where L > 0 is the Lipschitz constant of f from (a). Keep the rest of the parameters as provided x0 = xInit = 0, maxIter = 200,and tol = 104 . Include the generated plot with your solutions. Submit the PDF of your implementation.

(c) What is the fifinal error obtained by your implementation in (b)? Why can’t your algorithm accurately compute the true solution xtrue? Suppose you would minimize f over the nonnegative orthant Rn+ ={x Rn : xi 0, i ∈ {1, . . . , n}} instead of Rn. What would you expect?

(d) Create the script pgm.m for minimizing f over Rn+ using the projected gradient method (PGM). Keep all the parameters as in (b). Include the generated plot with your solutions. Submit the PDF of your implementation.

(e) What can we conclude on the solutions computed by algorithms in (b) and (d)?

Problem 5

We want to implement the conditional gradient method (CGM) for solving the following problem

(P) minimize f(x) =12k Ax bk 2

2 subject to k xk 1 τ, where τ 0, A Rm×n and b Rm. In this problem, we will work with the same dataset used in Problem

  1. The dataset and the initial script are stored in the archive assignment1.zip.

(a) Show that for any x 6 =0 and any i ∈ {1, . . . , n} with |xi| = maxj |xj |, we have sgn(xi)ei k xk , where sgn(·) is the sign function, k · kis the ` -norm, and {ei} is the canonical basis of Rn.

Hint: See the result of Problem 2.

(b) Show that arg min

k sk 1τ

 f(xt1)Ts = τ ∂k∇f(xt1)k .

Hint: See the result of Problem 3.

(c) Show that CGM for (P) can be effificiently implemented as

st = τ [sgn(f(xt1 ))]it eit where it arg max

i∈{1,…,n}



[f(xt1 )]i

xt = (1 γt)xt1 + γtst;

Note that the notation [·]i denotes the ith element of the input vector.

(d) Show that the exact line search yields the following step-size for CGM

γt = arg minγ[0,1]f