Mirror Descent: From Curved Geometry to Game Theory

Ziyi Zhu

Ziyi Zhu / October 17, 2025

6 min read––– views

Mirror descent is an elegant generalization of gradient descent that has found surprising applications in game theory and multi-agent learning. While gradient descent takes steps in Euclidean space, mirror descent allows us to work in curved geometries that better match the structure of our problem. This seemingly abstract idea has powerful practical consequences, particularly for finding Nash equilibria in games.

The Mirror Descent Algorithm

At its core, mirror descent replaces the Euclidean distance with a more general notion of "distance" called a Bregman divergence. Let's say we want to minimize a function f(x)f(x) over some convex set X\mathcal{X}. Standard gradient descent updates via:

xt+1=ΠX(xtηf(xt))x_{t+1} = \Pi_{\mathcal{X}}(x_t - \eta \nabla f(x_t))

where ΠX\Pi_{\mathcal{X}} projects back onto the feasible set and η\eta is the step size.

Mirror descent instead uses a mirror map Φ:XR\Phi: \mathcal{X} \to \mathbb{R}, which is a strongly convex function that acts as a "lens" through which we view the space. The algorithm proceeds in three steps:

  1. Gradient step in dual space: Compute yt+1=Φ(xt)ηf(xt)y_{t+1} = \nabla \Phi(x_t) - \eta \nabla f(x_t)
  2. Mirror step: Set xt+1=(Φ)1(yt+1)x_{t+1} = (\nabla \Phi)^{-1}(y_{t+1})
  3. Project if necessary: Ensure xt+1Xx_{t+1} \in \mathcal{X}

This can be written more compactly using the Bregman divergence DΦ(x,y)=Φ(x)Φ(y)Φ(y),xyD_\Phi(x, y) = \Phi(x) - \Phi(y) - \langle \nabla \Phi(y), x - y \rangle:

xt+1=argminxX{f(xt),x+1ηDΦ(x,xt)}x_{t+1} = \arg\min_{x \in \mathcal{X}} \left\{ \langle \nabla f(x_t), x \rangle + \frac{1}{\eta} D_\Phi(x, x_t) \right\}

The choice of Φ\Phi determines the geometry. For example, when Φ(x)=12x2\Phi(x) = \frac{1}{2}\|x\|^2, we recover standard gradient descent. For probability simplices, the negative entropy Φ(x)=ixilogxi\Phi(x) = \sum_i x_i \log x_i leads to multiplicative updates.

Recovering Standard Gradient Descent

When Φ(x)=12x2\Phi(x) = \frac{1}{2}\|x\|^2, we have Φ(x)=x\nabla \Phi(x) = x and the Bregman divergence becomes:

DΦ(x,y)=12x212y2y,xy=12xy2D_\Phi(x, y) = \frac{1}{2}\|x\|^2 - \frac{1}{2}\|y\|^2 - \langle y, x - y \rangle = \frac{1}{2}\|x - y\|^2

This is just the squared Euclidean distance. Plugging into the mirror descent update:

xt+1=argminx{f(xt),x+12ηxxt2}x_{t+1} = \arg\min_{x} \left\{ \langle \nabla f(x_t), x \rangle + \frac{1}{2\eta} \|x - x_t\|^2 \right\}

Taking the gradient and setting to zero gives f(xt)+1η(xt+1xt)=0\nabla f(x_t) + \frac{1}{\eta}(x_{t+1} - x_t) = 0, which rearranges to:

xt+1=xtηf(xt)x_{t+1} = x_t - \eta \nabla f(x_t)

This is exactly standard gradient descent—the Euclidean mirror map gives us Euclidean geometry.

Finding Nash Equilibria in Games

Consider a two-player game where player 1 chooses a strategy xXx \in \mathcal{X} and player 2 chooses yYy \in \mathcal{Y}. Player 1 wants to minimize their loss f(x,y)f(x, y) while player 2 wants to minimize g(x,y)g(x, y). A Nash equilibrium (x,y)(x^*, y^*) satisfies:

f(x,y)f(x,y) for all xXf(x^*, y^*) \leq f(x, y^*) \text{ for all } x \in \mathcal{X} g(x,y)g(x,y) for all yYg(x^*, y^*) \leq g(x^*, y) \text{ for all } y \in \mathcal{Y}

Finding Nash equilibria is generally difficult—in fact, it's PPAD-complete. However, for certain games (like zero-sum games where g(x,y)=f(x,y)g(x,y) = -f(x,y)), mirror descent offers an elegant solution.

Simultaneous Mirror Descent

The key insight is to have both players run mirror descent simultaneously:

xt+1=argminxX{xf(xt,yt),x+1ηDΦ1(x,xt)}x_{t+1} = \arg\min_{x \in \mathcal{X}} \left\{ \langle \nabla_x f(x_t, y_t), x \rangle + \frac{1}{\eta} D_{\Phi_1}(x, x_t) \right\} yt+1=argminyY{yg(xt,yt),y+1ηDΦ2(y,yt)}y_{t+1} = \arg\min_{y \in \mathcal{Y}} \left\{ \langle \nabla_y g(x_t, y_t), y \rangle + \frac{1}{\eta} D_{\Phi_2}(y, y_t) \right\}

For zero-sum games, this converges to a Nash equilibrium under appropriate conditions. The time-averaged iterates xˉT=1Tt=1Txt\bar{x}_T = \frac{1}{T}\sum_{t=1}^T x_t and yˉT=1Tt=1Tyt\bar{y}_T = \frac{1}{T}\sum_{t=1}^T y_t achieve low regret, meaning:

1Tt=1Tf(xt,yt)minxf(x,yˉT)=O(1/T)\frac{1}{T}\sum_{t=1}^T f(x_t, y_t) - \min_x f(x, \bar{y}_T) = O(1/\sqrt{T})

As TT \to \infty, the average play converges to Nash equilibrium strategies.

Why Does Mirror Descent Help?

The power of mirror descent for game theory comes from choosing Φ\Phi to match the constraint geometry. For games over probability distributions (like rock-paper-scissors), the entropic mirror map Φ(p)=ipilogpi\Phi(p) = \sum_i p_i \log p_i yields the famous multiplicative weights update:

xt+1(i)xt(i)eηif(xt,yt)x_{t+1}^{(i)} \propto x_t^{(i)} \cdot e^{-\eta \nabla_i f(x_t, y_t)}

This keeps strategies in the probability simplex automatically and has better convergence properties than projected gradient descent in this geometry.

Example: Rock-Paper-Scissors

Let's see multiplicative weights in action on the classic rock-paper-scissors game. This is a zero-sum game where both players choose from actions {R,P,S}\{R, P, S\}. The payoff matrix for player 1 is:

A=(011101110)A = \begin{pmatrix} 0 & -1 & 1 \\ 1 & 0 & -1 \\ -1 & 1 & 0 \end{pmatrix}

where rows represent player 1's actions and columns represent player 2's actions. Player 1's expected loss when playing mixed strategy xx against player 2's strategy yy is f(x,y)=xAyf(x, y) = x^\top A y.

The Nash equilibrium is for both players to play uniformly at random: x=y=(13,13,13)x^* = y^* = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3}). Let's see how multiplicative weights finds this.

Setting Up Multiplicative Weights

Using the entropic mirror map Φ(x)=ixilogxi\Phi(x) = \sum_i x_i \log x_i on the probability simplex, the mirror descent update becomes:

xt+1(i)=xt(i)eη(Ayt)ijxt(j)eη(Ayt)jx_{t+1}^{(i)} = \frac{x_t^{(i)} \cdot e^{-\eta (Ay_t)_i}}{\sum_j x_t^{(j)} \cdot e^{-\eta (Ay_t)_j}}

where (Ayt)i(Ay_t)_i is the expected loss for playing action ii against opponent strategy yty_t. The denominator normalizes to ensure xt+1x_{t+1} is a valid probability distribution.

The Dynamics

Suppose player 2 plays yt=(0.5,0.3,0.2)y_t = (0.5, 0.3, 0.2) (heavily favoring rock). Player 1 computes:

Ayt=(011101110)(0.50.30.2)=(0.10.30.1)Ay_t = \begin{pmatrix} 0 & -1 & 1 \\ 1 & 0 & -1 \\ -1 & 1 & 0 \end{pmatrix} \begin{pmatrix} 0.5 \\ 0.3 \\ 0.2 \end{pmatrix} = \begin{pmatrix} -0.1 \\ 0.3 \\ -0.1 \end{pmatrix}

This shows that rock and scissors have expected loss 0.1-0.1 (slight advantage) while paper has loss 0.30.3 (disadvantage) against this opponent. With learning rate η=0.1\eta = 0.1, player 1 updates:

xt+1(P)xt(P)e0.10.3=xt(P)e0.03x_{t+1}^{(P)} \propto x_t^{(P)} \cdot e^{-0.1 \cdot 0.3} = x_t^{(P)} \cdot e^{-0.03}

The multiplicative form naturally decreases probability on paper (high loss) while increasing it on rock and scissors (low loss). Over many iterations with both players adapting, strategies oscillate but their time-averages converge to the uniform Nash equilibrium (13,13,13)(\frac{1}{3}, \frac{1}{3}, \frac{1}{3}). The exponential update ensures strategies remain valid probabilities without explicit projection, while the logarithmic geometry naturally handles the simplex boundary.

Final Thoughts

Mirror descent elegantly connects optimization geometry with game-theoretic equilibria. By choosing an appropriate mirror map, we can design algorithms that naturally respect the structure of strategy spaces while providing convergence guarantees. The rock-paper-scissors example illustrates how multiplicative weights—a special case of mirror descent—adaptively learns optimal play through simple exponential updates. This has made mirror descent and its variants foundational tools in online learning, multi-agent reinforcement learning, and computational game theory—proving that sometimes the best way forward is to look through a curved mirror.