Online Optimization Specialization (1/4): Review of 'Online Convex Programming and Generalized Infinitesimal Gradient Ascent' by Martin Zinkevich

Specialization Introduction

This specialization covers five selected grounding papers in online optimization. In each blog, I will discuss one paper, where I aim to include

Brief introduction and summary to the paper
Key takeaways of the paper

Notice that all the discussion and summary in this specialization are based on the reviewed papers. None of the algorithms or theorems is proposed by myself.

Summary

Paper Detail

Title: Online Convex Programming and Generalized Infinitesimal Gradient Ascent

Author(s): Martin Zinkevich

URL: https://www.aaai.org/Library/ICML/2003/icml03-120.php

Abstract

Convex programming involves a convex set $F \subseteq R^n$ and a convex function $c : F \to \mathbb{R}$. The goal of convex programming is to find a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized infinitesimal gradient ascent (GIGA) is universally consistent.

Definitions

Online Convex Programming Problem

An online convex programming problem consists of a feasible set $F \subseteq R^n$ and an infinite sequence ${c^1, c^2, ...}$ where each $c^t: F \to \mathbb{R}$ is a convex function. At each time step t, an online convex programming algorithm selects a vector $x^t \in F$. After the vector is selected, it receives the cost function $c^t$.

Norm, Distance, and Projection

Define the norm and distance $\| x \| = \sqrt{x \cdot x}$ and $d(x,y) = \| x - y \|$. Define the projection $P(y) = arg \min \limits_{x \in F} d(x,y)$. Define $\| F \| = \max\limits_{x,y \in F} d(x,y)$ and $\| \nabla c \| = \sup \limits_{x\in F, t \in \{1,2,...\}} \| \nabla c^t(x) \|$.

Cost and Regret

Given an algorithm A, and a convex programming problem $(F,\{c^1, c^2,...\})$, if $\{x^1, x^2,...\}$ are the vectors selected by A, then the cost of A until time T is

$$
C_A(T) = \sum_{t=1}^T c^t(x^t).
$$

The cost of a static feasible solution $x \in F$ until time T is

$$
C_x(T) = \sum_{t=1}^T c^t(x).
$$

The regret of algorithm A until time T is

$$
R_A(T) = C_A(T) - \min \limits_{x \in F} C_x(T).
$$

Universally Consistent

A behavior $\sigma$ is universally consistent if for any $\epsilon > 0 $ there exists a $T$ such that for all $\rho$:

$$
{Pr}_{h \in F_{\sigma,\rho}} [\forall t > T, \frac{R(h|_t)}{t} > \epsilon] < \epsilon
$$

In other words, after some time, with high probability, the average regret never again exceeds $\epsilon$. And this convergence over time is uniform overall environments.

Self-Oblivious

A behavior $\sigma$ is self-oblivious if there exists a function $f: Y^* \to \triangle (A)$ such that for all $h \in H$, $\sigma (h) = f(\Pi_2(h))$.

In other words, a behavior $\sigma$ is self-oblivious if it only depends on the history of actions of the environment. Here, $Y^* = \cup_{i =0}^\infty Y^i$, and $\Pi_2: H \to Y^*$ such that for all $h \in H$,

$$
\Pi_2(h) = \{ h_{1,2}, h_{2,2}, h_{3,2}, ..., h_{|h|,2} \}.
$$

Self-oblivious algorithms tend to be robust against adaptive adversaries, those that change their technique based on past actions of the behavior.

Comment

Note that all information is unveiled in the setting of an online convex programming problem before the decisions are made. Thus, online algorithms do not reach 'solutions.' Instead, we compare the online algorithm with the best algorithms in hindsight, which knows all of the cost functions and selects one fixed vector.

Key assumptions in the paper:

The feasible set F is bounded, closed, and nonempty.
For all t, $c^t$ is differentiable, and the first-order derivative $\| \nabla c^t(x) \|$ is bounded.

Algorithms

Algorithm 1 Greedy Projection

Select an arbitrary $x_1 \in F$ and a sequence of learning rates $\eta_1, \eta_2, ... \in \mathbb{R}^+$. In time step t, after receiving a cost function, select the next vector $x^{t+1}$ according to:

$$
x^{t+1} = P(x^t - \eta_t \nabla c^t(x^t)).
$$

Algorithm 2 Lazy Projection

Select an arbitrary $x^1 \in F$ and a sequence of learning rates $\eta_1, \eta_2, ... \in \mathbb{R}$. Define $y_1 = x_1$. In time step t, after receiving a cost function, define $y^{t+1}$:

$$
y^{t+1} = y^t - \eta_t \nabla c^t(x^t)
$$

and select the vector:

$$
x^{t+1} = P(y^{t+1}).
$$

Comparing Greedy Projection and Lazy Projection

In Lazy Projection, the author introduces a new variable $y$, which is not necessarily equal to x, except $y^1 = x^1$. For each time step t, the Greedy Projection directly applies the gradient descent on $x^t$, whereas the Lazy Projection performs the update on $y^t$.

Algorithm 3 Generalized Infinitesimal Gradient Ascent

Choose a sequence of learning rates $\{ \eta_1, \eta_2, ...\}$. Begin with an arbitrary vector $x^1 \in F$. Then for each round t:

Play according to $x^t$: play action i with probability $x_i^t$.
Observe the action $h_{t,2}$ of the other player and calculate:

$$
y_i^{t+1} = x_i^t + \eta_t\mu (i,h_{t,2})
$$

$$
x^{t+1} = P(y^{t+1})
$$

where $P(y) = arg\min \limits_{x \in F} d(x,y) $, as before.

Understanding GIGA

The GIGA is motivated by the infinitesimal gradient ascent algorithm, which is proposed to design strategy in a repeated game setting. Here, we suppose a set of actions $A = \{ 1,...,n \}$. In each step of the repeated game, a distribution is selected over the actions, which is represented by an n-dimension vector $x \in \mathbb{R}$, such that $\sum_{i=1}^n x_i = 1$. The goal of GIGA is to maximize the utility by gradient ascent. For this purpose, there are two steps in each iteration. In step 1, the algorithms choose the action based on the probability distribution $x$. In step two, use the gradient ascent algorithm to update that particular vector component, and then project vector $y^{t+1}$ back to $x^{t+1}$.

Note that GIGA is self-oblivious and universally consistent.

Expert Algorithm and Online Linear Programming Algorithm

An experts problem is a set of experts $E = \{ e_1, ..., e_n \}$ and a sequence of cost vectors $c^1,c^2,...$ where for all $i, c^i \in \mathbb{R}^n$. On each round t, an expert algorithm (EA) first selects a distribution $D^t \in \triangle (E)$, and then observes a cost vector $c^t$.

The author defines online linear programming algorithm (OLPA) similarly by replacing the set of experts $E$ with a polytope $F \subseteq \mathbb{R}^n$. And an OLPA can be constructed from an EA.

Takeaways

In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step.
When measuring the performance of online algorithms, we compare the online algorithm with the best algorithms in hindsight.
The key definitions and algorithms in online convex programming described above.
GIGA is self-oblivious and universally consistent, i.e., the average regret of GIGA approaches zero, and the algorithm is more robust than adaptive adversaries.
Online convex programming algorithms are motivated by the older algorithms, such as the infinitesimal gradient ascent algorithm, experts algorithm. In section 4, the paper shows how to translate algorithms for mixing experts into algorithms for online linear programs, and online linear programming algorithms for online convex programs.

Search This Blog

山南水北