Infinite horizon
What Is Infinite Horizon?
Infinite horizon, in control theory and decision mathematics, refers to an optimization framework in which the decision-making process continues without a fixed terminal time. Rather than minimizing or maximizing an objective function over a specified number of steps, an infinite-horizon formulation accumulates costs or rewards indefinitely into the future. This formulation arises naturally whenever a system must be operated continuously, wherever it is not meaningful to specify when planning should end, or wherever the analysis focuses on steady-state behavior rather than transient dynamics. Infinite-horizon problems are standard in operations research, control engineering, economics, and reinforcement learning.
The mathematical treatment of infinite-horizon problems relies on tools from optimization and the theory of Markov processes. In a Markov decision process (MDP), the agent selects actions over time, and transitions to the next state depend only on the current state and the chosen action. The infinite-horizon formulation asks for a policy that achieves the best possible long-run performance, given this Markovian structure.
Bellman Equations and Dynamic Programming
The key tool for solving infinite-horizon problems is the Bellman equation, named after Richard Bellman, who introduced dynamic programming in the 1950s. The Bellman optimality equation expresses the value of a state as the immediate reward obtained there plus the discounted or average value of the successor state under the optimal policy. This recursive characterization enables iterative algorithms: value iteration applies the Bellman operator repeatedly until convergence to the optimal value function, and policy iteration alternates between evaluating a fixed policy and improving it by acting greedily with respect to the current value estimate. Both algorithms are guaranteed to converge for finite MDPs under standard conditions. The MIT introduction to machine learning notes on MDPs covers the derivation and properties of these algorithms in detail.
Discounted and Average-Reward Criteria
Two objective functions dominate the infinite-horizon literature. The discounted-reward criterion sums future rewards weighted by a discount factor gamma in (0,1), which ensures the infinite sum is finite and places greater weight on near-term rewards. The discount factor has an economic interpretation as a per-period probability of continuation or as a time preference rate. The average-reward criterion, by contrast, maximizes the long-run average reward per time step, treating the infinite future symmetrically and focusing the optimization on steady-state performance. Discounted formulations are more tractable analytically and are used in most reinforcement learning systems; average-reward formulations are natural in scheduling, queueing, and process control, where the horizon is genuinely unbounded and transients are negligible. A detailed treatment of discounted infinite-horizon MDP control appears in the arXiv paper on optimal control of discounted infinite-horizon MDPs.
Optimal Policies and Stationarity
A key structural result for infinite-horizon MDPs is the existence of a stationary optimal policy: a policy that selects the same action in the same state regardless of the time step. This stands in contrast to finite-horizon problems, where the optimal policy generally depends on the time remaining. Stationarity simplifies both theoretical analysis and practical implementation, since the policy can be stored as a lookup table over states. Linear programming provides an alternative solution method, formulating the Bellman equations as a linear program over the value function. Continuous-time analogues, governed by the Hamilton-Jacobi-Bellman partial differential equation, extend these ideas to dynamical systems described by differential equations, with applications in optimal control of physical systems and economic growth models. The Stanford EE266 lecture notes on infinite-horizon MDPs present the formal development for both discrete and continuous cases.
Applications
Infinite-horizon optimization has applications in a range of fields, including:
- Reinforcement learning for robotic control and autonomous navigation
- Power systems operation and energy resource scheduling
- Inventory management and supply-chain optimization
- Macroeconomic policy modeling and capital allocation theory
- Telecommunications network routing and resource management