Mirror Descent: From Curved Geometry to Game Theory
Ziyi Zhu / October 17, 2025
6 min read • ––– views
Mirror descent is an elegant generalization of gradient descent that has found surprising applications in game theory and multi-agent learning. While gradient descent takes steps in Euclidean space, mirror descent allows us to work in curved geometries that better match the structure of our problem. This seemingly abstract idea has powerful practical consequences, particularly for finding Nash equilibria in games.
The Mirror Descent Algorithm
At its core, mirror descent replaces the Euclidean distance with a more general notion of "distance" called a Bregman divergence. Let's say we want to minimize a function over some convex set . Standard gradient descent updates via:
where projects back onto the feasible set and is the step size.
Mirror descent instead uses a mirror map , which is a strongly convex function that acts as a "lens" through which we view the space. The algorithm proceeds in three steps:
- Gradient step in dual space: Compute
- Mirror step: Set
- Project if necessary: Ensure
This can be written more compactly using the Bregman divergence :
The choice of determines the geometry. For example, when , we recover standard gradient descent. For probability simplices, the negative entropy leads to multiplicative updates.
Recovering Standard Gradient Descent
When , we have and the Bregman divergence becomes:
This is just the squared Euclidean distance. Plugging into the mirror descent update:
Taking the gradient and setting to zero gives , which rearranges to:
This is exactly standard gradient descent—the Euclidean mirror map gives us Euclidean geometry.
Finding Nash Equilibria in Games
Consider a two-player game where player 1 chooses a strategy and player 2 chooses . Player 1 wants to minimize their loss while player 2 wants to minimize . A Nash equilibrium satisfies:
Finding Nash equilibria is generally difficult—in fact, it's PPAD-complete. However, for certain games (like zero-sum games where ), mirror descent offers an elegant solution.
Simultaneous Mirror Descent
The key insight is to have both players run mirror descent simultaneously:
For zero-sum games, this converges to a Nash equilibrium under appropriate conditions. The time-averaged iterates and achieve low regret, meaning:
As , the average play converges to Nash equilibrium strategies.
Why Does Mirror Descent Help?
The power of mirror descent for game theory comes from choosing to match the constraint geometry. For games over probability distributions (like rock-paper-scissors), the entropic mirror map yields the famous multiplicative weights update:
This keeps strategies in the probability simplex automatically and has better convergence properties than projected gradient descent in this geometry.
Example: Rock-Paper-Scissors
Let's see multiplicative weights in action on the classic rock-paper-scissors game. This is a zero-sum game where both players choose from actions . The payoff matrix for player 1 is:
where rows represent player 1's actions and columns represent player 2's actions. Player 1's expected loss when playing mixed strategy against player 2's strategy is .
The Nash equilibrium is for both players to play uniformly at random: . Let's see how multiplicative weights finds this.
Setting Up Multiplicative Weights
Using the entropic mirror map on the probability simplex, the mirror descent update becomes:
where is the expected loss for playing action against opponent strategy . The denominator normalizes to ensure is a valid probability distribution.
The Dynamics
Suppose player 2 plays (heavily favoring rock). Player 1 computes:
This shows that rock and scissors have expected loss (slight advantage) while paper has loss (disadvantage) against this opponent. With learning rate , player 1 updates:
The multiplicative form naturally decreases probability on paper (high loss) while increasing it on rock and scissors (low loss). Over many iterations with both players adapting, strategies oscillate but their time-averages converge to the uniform Nash equilibrium . The exponential update ensures strategies remain valid probabilities without explicit projection, while the logarithmic geometry naturally handles the simplex boundary.
Final Thoughts
Mirror descent elegantly connects optimization geometry with game-theoretic equilibria. By choosing an appropriate mirror map, we can design algorithms that naturally respect the structure of strategy spaces while providing convergence guarantees. The rock-paper-scissors example illustrates how multiplicative weights—a special case of mirror descent—adaptively learns optimal play through simple exponential updates. This has made mirror descent and its variants foundational tools in online learning, multi-agent reinforcement learning, and computational game theory—proving that sometimes the best way forward is to look through a curved mirror.