Preface |
Introduction / 1: |
Applications and problems / 1.1: |
Definitions and terminology / 1.2: |
Cross-validation / 1.3: |
Learning scenarios / 1.4: |
Outline / 1.5: |
The PAC Learning Framework / 2: |
The PAC learning model / 2.1: |
Guarantees for finite hypothesis sets - consistent case / 2.2: |
Guarantees for finite hypothesis sets - inconsistent case / 2.3: |
Generalities / 2.4: |
Deterministic versus stochastic scenarios / 2.4.1: |
Bayes error and noise / 2.4.2: |
Estimation and approximation errors / 2.4.3: |
Model selection / 2.4.4: |
Chapter notes / 2.5: |
Exercises / 2.6: |
Rademacher Complexity and VC-Dimension / 3: |
Rademacher complexity / 3.1: |
Growth function / 3.2: |
VC-dimension / 3.3: |
Lower bounds / 3.4: |
Support Vector Machines / 3.5: |
Linear classification / 4.1: |
SVMs - separable case / 4.2: |
Primal optimization problem / 4.2.1: |
Support vectors / 4.2.2: |
Dual optimization problem / 4.2.3: |
Leave-one-out analysis / 4.2.4: |
SVMs - non-separable case / 4.3: |
Margin theory / 4.3.1: |
Kernel Methods / 4.5: |
Positive definite symmetric kernels / 5.1: |
Definitions / 5.2.1: |
Reproducing kernel Hilbert space / 5.2.2: |
Properties / 5.2.3: |
Kernel-based algorithms / 5.3: |
SVMs with PDS kernels / 5.3.1: |
Representer theorem / 5.3.2: |
Learning guarantees / 5.3.3: |
Negative definite symmetric kernels / 5.4: |
Sequence kernels / 5.5: |
Weighted transducers / 5.5.1: |
Rational kernels / 5.5.2: |
Boosting / 5.6: |
AdaBoost / 6.1: |
Bound on the empirical error / 6.2.1: |
Relationship with coordinate descent / 6.2.2: |
Relationship with logistic regression / 6.2.3: |
Standard use in practice / 6.2.4: |
Theoretical results / 6.3: |
VC-dimension-based analysis / 6.3.1: |
Margin-based analysis / 6.3.2: |
Margin maximization / 6.3.3: |
Game-theoretic interpretation / 6.3.4: |
Discussion / 6.4: |
On-Line Learning / 6.5: |
Prediction with expert advice / 7.1: |
Mistake bounds and Halving algorithm / 7.2.1: |
Weighted majority algorithm / 7.2.2: |
Randomized weighted majority algorithm / 7.2.3: |
Exponential weighted average algorithm / 7.2.4: |
Perceptron algorithm / 7.3: |
Winnow algorithm / 7.3.2: |
On-line to batch conversion / 7.4: |
Game-theoretic connection / 7.5: |
Multi-Class Classification / 7.6: |
Multi-class classification problem / 8.1: |
Generalization bounds / 8.2: |
Uncombined multi-class algorithms / 8.3: |
Multi-class SVMs / 8.3.1: |
Multi-class boosting algorithms / 8.3.2: |
Decision trees / 8.3.3: |
Aggregated multi-class algorithms / 8.4: |
One-versus-all / 8.4.1: |
One-versus-one / 8.4.2: |
Error-correction codes / 8.4.3: |
Structured prediction algorithms / 8.5: |
Ranking / 8.6: |
The problem of ranking / 9.1: |
Generalization bound / 9.2: |
Ranking with SVMs / 9.3: |
RankBoost / 9.4: |
Margin bound for ensemble methods in ranking / 9.4.1: |
Bipartite ranking / 9.5: |
Boosting in bipartite ranking / 9.5.1: |
Area under the ROC curve / 9.5.2: |
Preference-based setting / 9.6: |
Second-stage ranking problem / 9.6.1: |
Deterministic algorithm / 9.6.2: |
Randomized algorithm / 9.6.3: |
Extension to other loss functions / 9.6.4: |
Regression / 9.7: |
The problem of regression / 10.1: |
Finite hypothesis sets / 10.2: |
Rademacher complexity bounds / 10.2.2: |
Pseudo-dimension bounds / 10.2.3: |
Regression algorithms / 10.3: |
Linear regression / 10.3.1: |
Kernel ridge regression / 10.3.2: |
Support vector regression / 10.3.3: |
Lasso / 10.3.4: |
Group norm regression algorithms / 10.3.5: |
On-line regression algorithms / 10.3.6: |
Algorithmic Stability / 10.4: |
Stability-based generalization guarantee / 11.1: |
Stability of kernel-based regularization algorithms / 11.3: |
Application to regression algorithms: SVR and KRR / 11.3.1: |
Application to classification algorithms: SVMs / 11.3.2: |
Dimensionality Reduction / 11.3.3: |
Principal Component Analysis / 12.1: |
Kernel Principal Component Analysis (KPCA) / 12.2: |
KPCA and manifold learning / 12.3: |
Isomap / 12.3.1: |
Laplacian eigenmaps / 12.3.2: |
Locally linear embedding (LLE) / 12.3.3: |
Johnson-Lindenstrauss lemma / 12.4: |
Learning Automata and Languages / 12.5: |
Finite automata / 13.1: |
Efficient exact learning / 13.3: |
Passive learning / 13.3.1: |
Learning with queries / 13.3.2: |
Learning automata with queries / 13.3.3: |
Identification in the limit / 13.4: |
Learning reversible automata / 13.4.1: |
Reinforcement Learning / 13.5: |
Learning scenario / 14.1: |
Markov decision process model / 14.2: |
Policy / 14.3: |
Definition / 14.3.1: |
Policy value / 14.3.2: |
Policy evaluation / 14.3.3: |
Optimal policy / 14.3.4: |
Planning algorithms / 14.4: |
Value iteration / 14.4.1: |
Policy iteration / 14.4.2: |
Linear programming / 14.4.3: |
Learning algorithms / 14.5: |
Stochastic approximation / 14.5.1: |
TD(0) algorithm / 14.5.2: |
Q-learning algorithm / 14.5.3: |
SARSA / 14.5.4: |
TD(λ) algorithm / 14.5.5: |
Large state space / 14.5.6: |
Conclusion / 14.6: |
Linear Algebra Review / A: |
Vectors and norms / A.1: |
Norms / A.1.1: |
Dual norms / A.1.2: |
Matrices / A.2: |
Matrix norms / A.2.1: |
Singular value decomposition / A.2.2: |
Symmetric positive semidefinite (SPSD) matrices / A.2.3: |
Convex Optimization / B: |
Differentiation and unconstrained optimization / B.1: |
Convexity / B.2: |
Constrained optimization / B.3: |
Probability Review / B.4: |
Probability / C.1: |
Random variables / C.2: |
Conditional probability and independence / C.3: |
Expectation, Markov's inequality, and moment-generating function / C.4: |
Variance and Chebyshev's inequality / C.5: |
Concentration inequalities / D: |
Hoeffding's inequality / D.1: |
McDiarmid's inequality / D.2: |
Other inequalities / D.3: |
Binomial distribution: Slud's inequality / D.3.1: |
Normal distribution: tail bound / D.3.2: |
Khintchine-Kahane inequality / D.3.3: |
Notation / D.4: |
References |
Index |
Preface |
Introduction / 1: |
Applications and problems / 1.1: |