Mirror Descent: From Curved Geometry to Game Theory

Mirror descent is an elegant generalization of gradient descent that has found surprising applications in game theory and multi-agent learning. While gradient descent takes steps in Euclidean space, mirror descent allows us to work in curved geometries that better match the structure of our problem. This seemingly abstract idea has powerful practical consequences, particularly for finding Nash equilibria in games.

The Mirror Descent Algorithm

At its core, mirror descent replaces the Euclidean distance with a more general notion of "distance" called a Bregman divergence. Let's say we want to minimize a function $f(x)$ over some convex set $\mathcal{X}$ . Standard gradient descent updates via:

x_{t+1} = \Pi_{\mathcal{X}}(x_t - \eta \nabla f(x_t))

where $\Pi_{\mathcal{X}}$ projects back onto the feasible set and $\eta$ is the step size.

Mirror descent instead uses a mirror map $\Phi: \mathcal{X} \to \mathbb{R}$ , which is a strongly convex function that acts as a "lens" through which we view the space. The algorithm proceeds in three steps:

Gradient step in dual space: Compute $y_{t+1} = \nabla \Phi(x_t) - \eta \nabla f(x_t)$
Mirror step: Set $x_{t+1} = (\nabla \Phi)^{-1}(y_{t+1})$
Project if necessary: Ensure $x_{t+1} \in \mathcal{X}$

This can be written more compactly using the Bregman divergence $D_\Phi(x, y) = \Phi(x) - \Phi(y) - \langle \nabla \Phi(y), x - y \rangle$ :

x_{t+1} = \arg\min_{x \in \mathcal{X}} \left\{ \langle \nabla f(x_t), x \rangle + \frac{1}{\eta} D_\Phi(x, x_t) \right\}

The choice of $\Phi$ determines the geometry. For example, when $\Phi(x) = \frac{1}{2}\|x\|^2$ , we recover standard gradient descent. For probability simplices, the negative entropy $\Phi(x) = \sum_i x_i \log x_i$ leads to multiplicative updates.

Recovering Standard Gradient Descent

When $\Phi(x) = \frac{1}{2}\|x\|^2$ , we have $\nabla \Phi(x) = x$ and the Bregman divergence becomes:

D_\Phi(x, y) = \frac{1}{2}\|x\|^2 - \frac{1}{2}\|y\|^2 - \langle y, x - y \rangle = \frac{1}{2}\|x - y\|^2

This is just the squared Euclidean distance. Plugging into the mirror descent update:

x_{t+1} = \arg\min_{x} \left\{ \langle \nabla f(x_t), x \rangle + \frac{1}{2\eta} \|x - x_t\|^2 \right\}

Taking the gradient and setting to zero gives $\nabla f(x_t) + \frac{1}{\eta}(x_{t+1} - x_t) = 0$ , which rearranges to:

x_{t+1} = x_t - \eta \nabla f(x_t)

This is exactly standard gradient descent—the Euclidean mirror map gives us Euclidean geometry.

Finding Nash Equilibria in Games

Consider a two-player game where player 1 chooses a strategy $x \in \mathcal{X}$ and player 2 chooses $y \in \mathcal{Y}$ . Player 1 wants to minimize their loss $f(x, y)$ while player 2 wants to minimize $g(x, y)$ . A Nash equilibrium $(x^*, y^*)$ satisfies:

f(x^*, y^*) \leq f(x, y^*) \text{ for all } x \in \mathcal{X}

g(x^*, y^*) \leq g(x^*, y) \text{ for all } y \in \mathcal{Y}

Finding Nash equilibria is generally difficult—in fact, it's PPAD-complete. However, for certain games (like zero-sum games where $g(x,y) = -f(x,y)$ ), mirror descent offers an elegant solution.

Simultaneous Mirror Descent

The key insight is to have both players run mirror descent simultaneously:

x_{t+1} = \arg\min_{x \in \mathcal{X}} \left\{ \langle \nabla_x f(x_t, y_t), x \rangle + \frac{1}{\eta} D_{\Phi_1}(x, x_t) \right\}

y_{t+1} = \arg\min_{y \in \mathcal{Y}} \left\{ \langle \nabla_y g(x_t, y_t), y \rangle + \frac{1}{\eta} D_{\Phi_2}(y, y_t) \right\}

For zero-sum games, this converges to a Nash equilibrium under appropriate conditions. The time-averaged iterates $\bar{x}_T = \frac{1}{T}\sum_{t=1}^T x_t$ and $\bar{y}_T = \frac{1}{T}\sum_{t=1}^T y_t$ achieve low regret, meaning:

\frac{1}{T}\sum_{t=1}^T f(x_t, y_t) - \min_x f(x, \bar{y}_T) = O(1/\sqrt{T})

As $T \to \infty$ , the average play converges to Nash equilibrium strategies.

Why Does Mirror Descent Help?

The power of mirror descent for game theory comes from choosing $\Phi$ to match the constraint geometry. For games over probability distributions (like rock-paper-scissors), the entropic mirror map $\Phi(p) = \sum_i p_i \log p_i$ yields the famous multiplicative weights update:

x_{t+1}^{(i)} \propto x_t^{(i)} \cdot e^{-\eta \nabla_i f(x_t, y_t)}

This keeps strategies in the probability simplex automatically and has better convergence properties than projected gradient descent in this geometry.

Example: Rock-Paper-Scissors

Let's see multiplicative weights in action on the classic rock-paper-scissors game. This is a zero-sum game where both players choose from actions $\{R, P, S\}$ . The payoff matrix for player 1 is:

A = \begin{pmatrix} 0 & -1 & 1 \\ 1 & 0 & -1 \\ -1 & 1 & 0 \end{pmatrix}

where rows represent player 1's actions and columns represent player 2's actions. Player 1's expected loss when playing mixed strategy $x$ against player 2's strategy $y$ is $f(x, y) = x^\top A y$ .

The Nash equilibrium is for both players to play uniformly at random: $x^* = y^* = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})$ . Let's see how multiplicative weights finds this.

Setting Up Multiplicative Weights

Using the entropic mirror map $\Phi(x) = \sum_i x_i \log x_i$ on the probability simplex, the mirror descent update becomes:

x_{t+1}^{(i)} = \frac{x_t^{(i)} \cdot e^{-\eta (Ay_t)_i}}{\sum_j x_t^{(j)} \cdot e^{-\eta (Ay_t)_j}}

where $(Ay_t)_i$ is the expected loss for playing action $i$ against opponent strategy $y_t$ . The denominator normalizes to ensure $x_{t+1}$ is a valid probability distribution.

The Dynamics

Suppose player 2 plays $y_t = (0.5, 0.3, 0.2)$ (heavily favoring rock). Player 1 computes:

Ay_t = \begin{pmatrix} 0 & -1 & 1 \\ 1 & 0 & -1 \\ -1 & 1 & 0 \end{pmatrix} \begin{pmatrix} 0.5 \\ 0.3 \\ 0.2 \end{pmatrix} = \begin{pmatrix} -0.1 \\ 0.3 \\ -0.1 \end{pmatrix}

This shows that rock and scissors have expected loss $-0.1$ (slight advantage) while paper has loss $0.3$ (disadvantage) against this opponent. With learning rate $\eta = 0.1$ , player 1 updates:

x_{t+1}^{(P)} \propto x_t^{(P)} \cdot e^{-0.1 \cdot 0.3} = x_t^{(P)} \cdot e^{-0.03}

The multiplicative form naturally decreases probability on paper (high loss) while increasing it on rock and scissors (low loss). Over many iterations with both players adapting, strategies oscillate but their time-averages converge to the uniform Nash equilibrium $(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})$ . The exponential update ensures strategies remain valid probabilities without explicit projection, while the logarithmic geometry naturally handles the simplex boundary.

Final Thoughts

Mirror descent elegantly connects optimization geometry with game-theoretic equilibria. By choosing an appropriate mirror map, we can design algorithms that naturally respect the structure of strategy spaces while providing convergence guarantees. The rock-paper-scissors example illustrates how multiplicative weights—a special case of mirror descent—adaptively learns optimal play through simple exponential updates. This has made mirror descent and its variants foundational tools in online learning, multi-agent reinforcement learning, and computational game theory—proving that sometimes the best way forward is to look through a curved mirror.