Preface to the Second Edition |
Preface to the First Edition |
Acknowledgments |
The Challenges of Dynamic Programming / 1: |
A Dynamic Programming Example: A Shortest Path Problem / 1.1: |
The Three Curses of Dimensionality / 1.2: |
Some Real Applications / 1.3: |
Problem Classes / 1.4: |
The Many Dialects of Dynamic Programming / 1.5: |
What Is New in This Book? / 1.6: |
Pedagogy / 1.7: |
Bibliographic Notes / 1.8: |
Some Illustrative Models / 2: |
Deterministic Problems / 2.1: |
Stochastic Problems / 2.2: |
Information Acquisition Problems / 2.3: |
A Simple Modeling Framework for Dynamic Programs / 2.4: |
Problems / 2.5: |
Introduction to Markov Decision Processes / 3: |
The Optimality Equations / 3.1: |
Finite Horizon Problems / 3.2: |
Infinite Horizon Problems / 3.3: |
Value Iteration / 3.4: |
Policy Iteration / 3.5: |
Hybrid Value-Policy Iteration / 3.6: |
Average Reward Dynamic Programming / 3.7: |
The Linear Programming Method for Dynamic Programs / 3.8: |
Monotone Policies* / 3.9: |
Why Does It Work?** / 3.10: |
Introduction to Approximate Dynamic Programming / 3.11: |
The Three Curses of Dimensionality (Revisited) / 4.1: |
The Basic Idea / 4.2: |
Q-Learning and SARSA / 4.3: |
Real-Time Dynamic Programming / 4.4: |
Approximate Value Iteration / 4.5: |
The Post-Decision State Variable / 4.6: |
Low-Dimensional Representations of Value Functions / 4.7: |
So Just What Is Approximate Dynamic Programming? / 4.8: |
Experimental Issues / 4.9: |
But Does It Work? / 4.10: |
Modeling Dynamic Programs / 4.11: |
Notational Style / 5.1: |
Modeling Time / 5.2: |
Modeling Resources / 5.3: |
The States of Our System / 5.4: |
Modeling Decisions / 5.5: |
The Exogenous Information Process / 5.6: |
The Transition Function / 5.7: |
The Objective Function / 5.8: |
A Measure-Theoretic View of Information** / 5.9: |
Policies / 5.10: |
Myopic Policies / 6.1: |
Lookahead Policies / 6.2: |
Policy Function Approximations / 6.3: |
Value Function Approximations / 6.4: |
Hybrid Strategies / 6.5: |
Randomized Policies / 6.6: |
How to Choose a Policy? / 6.7: |
Policy Search / 6.8: |
Background / 7.1: |
Gradient Search / 7.2: |
Direct Policy Search for Finite Alternatives / 7.3: |
The Knowledge Gradient Algorithm for Discrete Alternatives / 7.4: |
Simulation Optimization / 7.5: |
Approximating Value Functions / 7.6: |
Lookup Tables and Aggregation / 8.1: |
Parametric Models / 8.2: |
Regression Variations / 8.3: |
Nonparametric Models / 8.4: |
Approximations and the Curse of Dimensionality / 8.5: |
Learning Value Function Approximations / 8.6: |
Sampling the Value of a Policy / 9.1: |
Stochastic Approximation Methods / 9.2: |
Recursive Least Squares for Linear Models / 9.3: |
Temporal Difference Learning with a Linear Model / 9.4: |
Bellman's Equation Using a Linear Model / 9.5: |
Analysis of TD(0), LSTD, and LSPE Using a Single State / 9.6: |
Gradient-Based Methods for Approximate Value Iteration* / 9.7: |
Least Squares Temporal Differencing with Kernel Regression* / 9.8: |
Value Function Approximations Based on Bayesian Learning* / 9.9: |
Why Does It Work* / 9.10: |
Optimizing While Learning / 9.11: |
Overview of Algorithmic Strategies / 10.1: |
Approximate Value Iteration and Q-Learning Using Lookup Tables / 10.2: |
Statistical Bias in the Max Operator / 10.3: |
Approximate Value Iteration and Q-Learning Using Linear Models / 10.4: |
Approximate Policy Iteration / 10.5: |
The Actor-Critic Paradigm / 10.6: |
Policy Gradient Methods / 10.7: |
The Linear Programming Method Using Basis Functions / 10.8: |
Approximate Policy Iteration Using Kernel Regression* / 10.9: |
Finite Horizon Approximations for Steady-State Applications / 10.10: |
Adaptive Estimation and Stepsizes / 10.11: |
Learning Algorithms and Stepsizes / 11.1: |
Deterministic Stepsize Recipes / 11.2: |
Stochastic Stepsizes / 11.3: |
Optimal Stepsizes for Nonstationary Time Series / 11.4: |
Optimal Stepsizes for Approximate Value Iteration / 11.5: |
Convergence / 11.6: |
Guidelines for Choosing Stepsize Formulas / 11.7: |
Exploration Versus Exploitation / 11.8: |
A Learning Exercise: The Nomadic Trucker / 12.1: |
An Introduction to Learning / 12.2: |
Heuristic Learning Policies / 12.3: |
Gittins Indexes for Online Learning / 12.4: |
The Knowledge Gradient Policy / 12.5: |
Learning with a Physical State / 12.6: |
Value Function Approximations for Resource Allocation Problems / 12.7: |
Value Functions versus Gradients / 13.1: |
Linear Approximations / 13.2: |
Piecewise-Linear Approximations / 13.3: |
Solving a Resource Allocation Problem Using Piecewise-Linear Functions / 13.4: |
The SHAPE Algorithm / 13.5: |
Regression Methods / 13.6: |
Cutting Planes* / 13.7: |
Dynamic Resource Allocation Problems / 13.8: |
An Asset Acquisition Problem / 14.1: |
The Blood Management Problem / 14.2: |
A Portfolio Optimization Problem / 14.3: |
A General Resource Allocation Problem / 14.4: |
A Fleet Management Problem / 14.5: |
A Driver Management Problem / 14.6: |
Implementation Challenges / 14.7: |
Will ADP Work for Your Problem? / 15.1: |
Designing an ADP Algorithm for Complex Problems / 15.2: |
Debugging an ADP Algorithm / 15.3: |
Practical Issues / 15.4: |
Modeling Your Problem / 15.5: |
Online versus Offline Models / 15.6: |
If It Works, Patent It! / 15.7: |
Bibliography |
Index |
Preface to the Second Edition |
Preface to the First Edition |
Acknowledgments |