Title: Convex and Structured Nonconvex Optimization for Modern Machine Learning: Complexity and Algorithms
Advisor: Dr. Guanghui (George) Lan, Dr. Santanu Dey
Committee:
Dr. Renato Monteiro, School of ISYE, Georgia Institute of Technology
Dr. Arkadi Nemirovski, School of ISYE, Georgia Institute of Technology
Dr. Santosh Vempala, School of Computer Science, Georgia Institute of Technology
Reader: Dr. Arkadi Nemirovski, ISYE.
Summary:
In this thesis, we investigate various optimization problems motivated by applications in modern-day machine learning. In the first part, we look at the computational complexity of training ReLU neural networks. We consider the following problem: given a fully-connected two hidden layer ReLU neural network with two ReLU nodes in the first layer and one ReLU node in the second layer, does there exists weights of the edges such that neural network fits the given data? We show that the problem is NP-hard to answer. The main contribution is the design of the gadget which allows for reducing the Separation by Two Hyperplane problem into ReLU neural network training problem.
In the second part of the thesis, we look at the design and complexity analysis of algorithms for function constrained optimization problem in both convex and nonconvex settings. These problems are becoming more and more popular in machine learning due to their applications in multi-objective optimization, risk-averse learning among others. For the convex function constrained optimization problem, we propose a novel Constraint Extrapolation (ConEx) method, which uses linear approximations of the constraint functions to define the extrapolation (or acceleration) step. We show that this method is a unified algorithm that achieves the best-known rate of convergence for solving different function constrained convex composite problems, including convex or strongly convex, and smooth or nonsmooth problems with a stochastic objective and/or stochastic constraints. Many of these convergence rates were obtained for the first time in the literature. Besides, ConEx is a single-loop algorithm that does not involve any penalty subproblems. Contrary to existing dual methods, it does not require the projection of Lagrangian multipliers onto a (possibly unknown) bounded set. Moreover, in the stochastic function constraint setting, this is the first method that requires only bounded variance of the noise; a major relaxation over the restrictive assumption of subgaussian noise in the existing algorithms.
In the third part of this thesis, we investigate a nonconvex nonsmooth function constrained optimization problem, where we introduce a new proximal point method which transforms the initial nonconvex problem into a sequence of convex function constrained subproblems. For this algorithm, we establish the asymptotic convergence as well as the rate of convergence to KKT points under different constraint qualifications. For practical use, we present inexact variants of this algorithm, in which approximate solutions of the subproblems are computed using the aforementioned ConEx method and establish their associated rate of convergence under a strong feasibility constraint qualification.
In the fourth part, we identify an important class of nonconvex function constrained problem for statistical machine learning applications where sparsity is imperative. We consider various nonconvex sparsity-inducing constraints. These are tighter approximations of $\ell_0$-norm compared to $\ell_1$-norm convex relaxation. For this class of problems, we relax the requirement of strong feasibility constraint qualification to a weaker and a well-known constraint qualification and still prove convergence to KKT points at the rate of gradient descent for nonconvex regularized problems. This work performs a systematic study of the structure of nonconvex sparsity inducing constraints to obtain bounds over Lagrange multipliers and solve certain subproblems faster to achieve convergence rate that matches the rates of nonconvex regularized version under a relaxed constraint qualification which is satisfied by almost all the time.
In the fifth part, we present a faster algorithm for solving mixed packing and covering (MPC) linear programs. The proposed algorithm is from a family of primal-dual type algorithm, similar to ConEx. Here, the main challenge comes from the feasible set of the primal variables which is $\ell_{\infty}$ ball for a general MPC. The diameter of the ball is at least $\Omega(\sqrt{n})$, where $n$ is the dimension of LP. This huge diameter also costs in the complexity. We give specialized treatment to this problem and use a new regularization function which is weaker than the strongly convex function and still obtains accelerated convergence rate. Using this regularizer, we replace the $\poly(n)$ term in the complexity to a logarithmic term.