close
1.

図書

図書
Warren B. Powell
出版情報: Hoboken, N.J. : Wiley, c2011  xviii, 627 p. ; 25 cm
シリーズ名: Wiley series in probability and mathematical statistics
所蔵情報: loading…
目次情報: 続きを見る
Preface to the Second Edition
Preface to the First Edition
Acknowledgments
The Challenges of Dynamic Programming / 1:
A Dynamic Programming Example: A Shortest Path Problem / 1.1:
The Three Curses of Dimensionality / 1.2:
Some Real Applications / 1.3:
Problem Classes / 1.4:
The Many Dialects of Dynamic Programming / 1.5:
What Is New in This Book? / 1.6:
Pedagogy / 1.7:
Bibliographic Notes / 1.8:
Some Illustrative Models / 2:
Deterministic Problems / 2.1:
Stochastic Problems / 2.2:
Information Acquisition Problems / 2.3:
A Simple Modeling Framework for Dynamic Programs / 2.4:
Problems / 2.5:
Introduction to Markov Decision Processes / 3:
The Optimality Equations / 3.1:
Finite Horizon Problems / 3.2:
Infinite Horizon Problems / 3.3:
Value Iteration / 3.4:
Policy Iteration / 3.5:
Hybrid Value-Policy Iteration / 3.6:
Average Reward Dynamic Programming / 3.7:
The Linear Programming Method for Dynamic Programs / 3.8:
Monotone Policies* / 3.9:
Why Does It Work?** / 3.10:
Introduction to Approximate Dynamic Programming / 3.11:
The Three Curses of Dimensionality (Revisited) / 4.1:
The Basic Idea / 4.2:
Q-Learning and SARSA / 4.3:
Real-Time Dynamic Programming / 4.4:
Approximate Value Iteration / 4.5:
The Post-Decision State Variable / 4.6:
Low-Dimensional Representations of Value Functions / 4.7:
So Just What Is Approximate Dynamic Programming? / 4.8:
Experimental Issues / 4.9:
But Does It Work? / 4.10:
Modeling Dynamic Programs / 4.11:
Notational Style / 5.1:
Modeling Time / 5.2:
Modeling Resources / 5.3:
The States of Our System / 5.4:
Modeling Decisions / 5.5:
The Exogenous Information Process / 5.6:
The Transition Function / 5.7:
The Objective Function / 5.8:
A Measure-Theoretic View of Information** / 5.9:
Policies / 5.10:
Myopic Policies / 6.1:
Lookahead Policies / 6.2:
Policy Function Approximations / 6.3:
Value Function Approximations / 6.4:
Hybrid Strategies / 6.5:
Randomized Policies / 6.6:
How to Choose a Policy? / 6.7:
Policy Search / 6.8:
Background / 7.1:
Gradient Search / 7.2:
Direct Policy Search for Finite Alternatives / 7.3:
The Knowledge Gradient Algorithm for Discrete Alternatives / 7.4:
Simulation Optimization / 7.5:
Approximating Value Functions / 7.6:
Lookup Tables and Aggregation / 8.1:
Parametric Models / 8.2:
Regression Variations / 8.3:
Nonparametric Models / 8.4:
Approximations and the Curse of Dimensionality / 8.5:
Learning Value Function Approximations / 8.6:
Sampling the Value of a Policy / 9.1:
Stochastic Approximation Methods / 9.2:
Recursive Least Squares for Linear Models / 9.3:
Temporal Difference Learning with a Linear Model / 9.4:
Bellman's Equation Using a Linear Model / 9.5:
Analysis of TD(0), LSTD, and LSPE Using a Single State / 9.6:
Gradient-Based Methods for Approximate Value Iteration* / 9.7:
Least Squares Temporal Differencing with Kernel Regression* / 9.8:
Value Function Approximations Based on Bayesian Learning* / 9.9:
Why Does It Work* / 9.10:
Optimizing While Learning / 9.11:
Overview of Algorithmic Strategies / 10.1:
Approximate Value Iteration and Q-Learning Using Lookup Tables / 10.2:
Statistical Bias in the Max Operator / 10.3:
Approximate Value Iteration and Q-Learning Using Linear Models / 10.4:
Approximate Policy Iteration / 10.5:
The Actor-Critic Paradigm / 10.6:
Policy Gradient Methods / 10.7:
The Linear Programming Method Using Basis Functions / 10.8:
Approximate Policy Iteration Using Kernel Regression* / 10.9:
Finite Horizon Approximations for Steady-State Applications / 10.10:
Adaptive Estimation and Stepsizes / 10.11:
Learning Algorithms and Stepsizes / 11.1:
Deterministic Stepsize Recipes / 11.2:
Stochastic Stepsizes / 11.3:
Optimal Stepsizes for Nonstationary Time Series / 11.4:
Optimal Stepsizes for Approximate Value Iteration / 11.5:
Convergence / 11.6:
Guidelines for Choosing Stepsize Formulas / 11.7:
Exploration Versus Exploitation / 11.8:
A Learning Exercise: The Nomadic Trucker / 12.1:
An Introduction to Learning / 12.2:
Heuristic Learning Policies / 12.3:
Gittins Indexes for Online Learning / 12.4:
The Knowledge Gradient Policy / 12.5:
Learning with a Physical State / 12.6:
Value Function Approximations for Resource Allocation Problems / 12.7:
Value Functions versus Gradients / 13.1:
Linear Approximations / 13.2:
Piecewise-Linear Approximations / 13.3:
Solving a Resource Allocation Problem Using Piecewise-Linear Functions / 13.4:
The SHAPE Algorithm / 13.5:
Regression Methods / 13.6:
Cutting Planes* / 13.7:
Dynamic Resource Allocation Problems / 13.8:
An Asset Acquisition Problem / 14.1:
The Blood Management Problem / 14.2:
A Portfolio Optimization Problem / 14.3:
A General Resource Allocation Problem / 14.4:
A Fleet Management Problem / 14.5:
A Driver Management Problem / 14.6:
Implementation Challenges / 14.7:
Will ADP Work for Your Problem? / 15.1:
Designing an ADP Algorithm for Complex Problems / 15.2:
Debugging an ADP Algorithm / 15.3:
Practical Issues / 15.4:
Modeling Your Problem / 15.5:
Online versus Offline Models / 15.6:
If It Works, Patent It! / 15.7:
Bibliography
Index
Preface to the Second Edition
Preface to the First Edition
Acknowledgments
2.

図書

図書
Dimitri P. Bertsekas
出版情報: Belmont, Mass. : Athena Scientific, c2017  xix, 555 p. ; 24 cm
シリーズ名: Athena Scientific optimization and computation series
所蔵情報: loading…
文献の複写および貸借の依頼を行う
 文献複写・貸借依頼