Online Optimization Specialization (4/4): Review of 'Projection-free Online Learning'
Specialization Introduction
This specialization covers five selected grounding papers in online optimization. In each blog, I will discuss one paper, where I aim to include
- Brief introduction and summary to the paper
- Key takeaways of the paper
Notice that all the discussion and summary in this specialization are based on the reviewed papers. None of the algorithms or theorems is proposed by myself.
Summary
Paper Detail
Title: Projection-free Online Learning
Author(s): Elad Hazan, Satyen Kale
Abstract
The computational bottleneck in applying online learning to massive data sets is usually the projection step. We present efficient online learning algorithms that eschew projections in favor of much more efficient linear optimization steps using the Frank-Wolfe technique. We obtain a range of regret bounds for online convex optimization, with better bounds for specific cases such as stochastic online smooth convex optimization.Besides the computational advantage, other desirable features of our algorithms are that they are parameter-free in the stochastic case and produce sparse decisions. We apply our algorithms to computationally intensive applications of collaborative filtering, and show the theoretical improvements to be clearly visible on standard datasets.
Advantage of the Projection-free Algorithm
- Parameter Free: The algorithm is parameter-free in the basic setting and can be naturally extended to the stochastic setting while maintaining the parameter-free feature.
- Efficient Representation and Sparsity: The algorithm maintains a distribution over the vertices of the decision set, thereby eliminating the need for any further decomposition. This gives a form of sparsity.
Preliminary Knowledge
Here, I will list the preliminary knowledge that we do not cover in class.
Smoothed Functions
Let $\mathbb{B}$ and $\mathbb{S}$ denote the unit ball and unit sphere in $\mathbb{R}^n$ respectively. Given $\delta > 0$, let the $\delta$-smoothing of a function $f$ be:
$$
\hat{f_\delta}(x) = E_{u \in \mathbb{B}}[f(x+\delta u)],
$$
\hat{f_\delta}(x) = E_{u \in \mathbb{B}}[f(x+\delta u)],
$$
where $u$ is chosen uniformly at random from $\mathbb{B}$. The following lemma shows that $\hat{f_\delta}$ is a good smooth approximation of $f$.
Lemma 2.1
If $f$ is convex and L-Lipschitz, then the function $\hat{f_\delta}$ has the following properties:
- $\hat{f_\delta}$ is convex and L-Lipschitz.
- For andy $x \in \mathcal{K}, \nabla\hat{f_\delta}(x) = \frac{d}{\delta}E_{u \in \mathbb{B}}[f(x+\delta u)u]$.
- For any $x \in \mathcal{K}, \| \nabla \hat{f_\delta}(x) \| \leq dL$.
- $\hat{f_\delta}$ is $\frac{dL}{\delta}$-smooth.
- For any $x \in \mathcal{K}, |f(x)-\hat{f_\delta}(x)| \leq \delta L$.
Sparsity
Let $\mathcal{K} \in \mathbb{R}^n$ be a convex, compact set and let $x \in \mathcal{K}$. We say that $x$ is t-sparse w.r.t. $\mathcal{K}$ if it can be written as a convex combination of t boundary points of $\mathcal{K}$.
All the algorithms in this paper produce t-sparse prediction at iteration t w.r.t. the underlying decision set $\mathcal{K}$.
Algorithm and Analysis
Algorithm 1 Online Frank-Wolfe
Theorem 3.1
Assume that for $t = 1,2,...,T$, the function $f_t$ is L-Lipschitz, $Bt^{-b}$-smooth for some constants $b \in [-1,1/2]$ and $B \geq 0$, and $St^{-s}$-strongly convex for some constants $s \in [0,1)$ and $S \geq 0$. Then in algorithm 1, for all $t > 1$, we have
$$\Delta_t \leq Ct^{-d},
$$
for both the following values of C and d:
$$
(C,d) = (max\{9D^2B,3LD\},\frac{1+b}{2})
$$
and
$$
(C,d) = (max\{9D^2B,36L^2/S, 3LD\},\frac{2+2b-s}{3}).
$$
Here, $\Delta_t = F_t(x_t) - F_t(x_t^*)$. In either case, this bound is obtained by setting $a = d - b$ in Algorithm 1.
Regret Bound
Similar to what we covered in class, the author discussed and proved the regret bound for Algorithm1. The discussion was divided into the smooth stochastic case and the non-smooth stochastic case. In either case, the algorithm achieves a tight and efficient bound.
Comments
Post a Comment