Foreword |
Basics / I: |
Introduction and Important Definitions / 1: |
Why Connectionist Models? / 1.1: |
The Grand Goals of Al and Its Current Impasse / 1.1.1: |
The Computational Appeal of Neural Networks / 1.1.2: |
The Structure of Connectionist Models / 1.2: |
Network Properties / 1.2.1: |
Cell Properties / 1.2.2: |
Dynamic Properties / 1.2.3: |
Learning Properties / 1.2.4: |
Two Fundamental Models: Multilayer Perceptrons (MLP's) and Backpropagation Networks (BPN's) / 1.3: |
Multilayer Perceptrons (MLP's) / 1.3.1: |
Backpropagation Networks (BPN's) / 1.3.2: |
Gradient Descent / 1.4: |
The Algorithm / 1.4.1: |
Practical Problems / 1.4.2: |
Comments / 1.4.3: |
Historic and Bibliographic Notes / 1.5: |
Early Work / 1.5.1: |
The Decline of the Perceptron / 1.5.2: |
The Rise of Connectionist Research / 1.5.3: |
Other Bibliographic Notes / 1.5.4: |
Exercises / 1.6: |
Programming Project / 1.7: |
Representation Issues / 2: |
Representing Boolean Functions / 2.1: |
Equivalence of {+1, -1,0} and {1,0} Forms / 2.1.1: |
Single-Cell Models / 2.1.2: |
Nonseparable Functions / 2.1.3: |
Representing Arbitrary Boolean Functions / 2.1.4: |
Representing Boolean Functions Using Continuous Connectionist Models / 2.1.5: |
Distributed Representations / 2.2: |
Definition / 2.2.1: |
Storage Efficiency and Resistance to Error / 2.2.2: |
Superposition / 2.2.3: |
Learning / 2.2.4: |
Feature Spaces and ISA Relations / 2.3: |
Feature Spaces / 2.3.1: |
Concept-Function Unification / 2.3.2: |
ISA Relations / 2.3.3: |
Binding / 2.3.4: |
Representing Real-Valued Functions / 2.4: |
Approximating Real Numbers by Collections of Discrete Cells / 2.4.1: |
Precision / 2.4.2: |
Approximating Real Numbers by Collections of Continuous Cells / 2.4.3: |
Example: Taxtime! / 2.5: |
Programming Projects / 2.6: |
Learning In Single-Layer Models / II: |
Perceptron Learning and the Pocket Algorithm / 3: |
Perceptron Learning for Separable Sets of Training Examples / 3.1: |
Statement of the Problem / 3.1.1: |
Computing the Bias / 3.1.2: |
The Perceptron Learning Algorithm / 3.1.3: |
Perceptron Convergence Theorem / 3.1.4: |
The Perceptron Cycling Theorem / 3.1.5: |
The Pocket Algorithm for Nonseparable Sets of Training Examples / 3.2: |
Problem Statement / 3.2.1: |
Perceptron Learning Is Poorly Behaved / 3.2.2: |
The Pocket Algorithm / 3.2.3: |
Ratchets / 3.2.4: |
Examples / 3.2.5: |
Noisy and Contradictory Sets of Training Examples / 3.2.6: |
Rules / 3.2.7: |
Implementation Considerations / 3.2.8: |
Proof of the Pocket Convergence Theorem / 3.2.9: |
Khachiyan's Linear Programming Algorithm / 3.3: |
Winner-Take-All Groups or Linear Machines / 3.4: |
Generalizes Single-Cell Models / 4.1: |
Perceptron Learning for Winner-Take-All Groups / 4.2: |
The Pocket Algorithm for Winner-Take-All Groups / 4.3: |
Kessler's Construction, Perceptron Cycling, and the Pocket Algorithm Proof / 4.4: |
Independent Training / 4.5: |
Autoassociators and One-Shot Learning / 4.6: |
Linear Autoassociators and the Outer-Product Training Rule / 5.1: |
Anderson's BSB Model / 5.2: |
Hopfieid's Model / 5.3: |
Energy / 5.3.1: |
The Traveling Salesman Problem / 5.4: |
The Cohen-Grossberg Theorem / 5.5: |
Kanerva's Model / 5.6: |
Autoassociative Filtering for Feedforward Networks / 5.7: |
Concluding Remarks / 5.8: |
Mean Squared Error (MSE) Algorithms / 5.9: |
Motivation / 6.1: |
MSE Approximations / 6.2: |
The Widrow-Hoff Rule or LMS Algorithm / 6.3: |
Number of Training Examples Required / 6.3.1: |
Adaline / 6.4: |
Adaptive Noise Cancellation / 6.5: |
Decision-Directed Learning / 6.6: |
Unsupervised Learning / 6.7: |
Introduction / 7.1: |
No Teacher / 7.1.1: |
Clustering Algorithms / 7.1.2: |
k-Means Clustering / 7.2: |
Topology-Preserving Maps / 7.2.1: |
Example / 7.3.1: |
Demonstrations / 7.3.4: |
Dimensionality, Neighborhood Size, and Final Comments / 7.3.5: |
Art1 / 7.4: |
Important Aspects of the Algorithm / 7.4.1: |
Art2 / 7.4.2: |
Using Clustering Algorithms for Supervised Learning / 7.6: |
Labeling Clusters / 7.6.1: |
ARTMAP or Supervised ART / 7.6.2: |
Learning In Multilayer Models / 7.7: |
The Distributed Method and Radial Basis Functions / 8: |
Rosenblatt's Approach / 8.1: |
The Distributed Method / 8.2: |
Cover's Formula / 8.2.1: |
Robustness-Preserving Functions / 8.2.2: |
Hepatobiliary Data / 8.3: |
Artificial Data / 8.3.2: |
How Many Cells? / 8.4: |
Pruning Data / 8.4.1: |
Leave-One-Out / 8.4.2: |
Radial Basis Functions / 8.5: |
A Variant: The Anchor Algorithm / 8.6: |
Scaling, Multiple Outputs, and Parallelism / 8.7: |
Scaling Properties / 8.7.1: |
Multiple Outputs and Parallelism / 8.7.2: |
A Computational Speedup for Learning / 8.7.3: |
Computational Learning Theory and the BRD Algorithm / 8.7.4: |
Introduction to Computational Learning Theory / 9.1: |
PAC-Learning / 9.1.1: |
Bounded Distributed Connectionist Networks / 9.1.2: |
Probabilistic Bounded Distributed Concepts / 9.1.3: |
A Learning Algorithm for Probabilistic Bounded Distributed Concepts / 9.2: |
The BRD Theorem / 9.3: |
Polynomial Learning / 9.3.1: |
Noisy Data and Fallback Estimates / 9.4: |
Vapnik-Chervonenkis Bounds / 9.4.1: |
Hoeffding and Chernoff Bounds / 9.4.2: |
Pocket Algorithm / 9.4.3: |
Additional Training Examples / 9.4.4: |
Bounds for Single-Layer Algorithms / 9.5: |
Fitting Data by Limiting the Number of Iterations / 9.6: |
Discussion / 9.7: |
Exercise / 9.8: |
Constructive Algorithms / 9.9: |
The Tower and Pyramid Algorithms / 10.1: |
The Tower Algorithm / 10.1.1: |
Proof of Convergence / 10.1.2: |
A Computational Speedup / 10.1.4: |
The Pyramid Algorithm / 10.1.5: |
The Cascade-Correlation Algorithm / 10.2: |
The Tiling Algorithm / 10.3: |
The Upstart Algorithm / 10.4: |
Other Constructive Algorithms and Pruning / 10.5: |
Easy Learning Problems / 10.6: |
Decomposition / 10.6.1: |
Expandable Network Problems / 10.6.2: |
Limits of Easy Learning / 10.6.3: |
Backpropagation / 10.7: |
The Backpropagation Algorithm / 11.1: |
Statement of the Algorithm / 11.1.1: |
A Numerical Example / 11.1.2: |
Derivation / 11.2: |
Practical Considerations / 11.3: |
Determination of Correct Outputs / 11.3.1: |
Initial Weights / 11.3.2: |
Choice of r / 11.3.3: |
Momentum / 11.3.4: |
Network Topology / 11.3.5: |
Local Minima / 11.3.6: |
Activations in [0,1] versus [-1, 1] / 11.3.7: |
Update after Every Training Example / 11.3.8: |
Other Squashing Functions / 11.3.9: |
NP-Completeness / 11.4: |
Overuse / 11.5: |
Interesting Intermediate Cells / 11.5.2: |
Continuous Outputs / 11.5.3: |
Probability Outputs / 11.5.4: |
Using Backpropagation to Train Multilayer Perceptrons / 11.5.5: |
Backpropagation: Variations and Applications / 11.6: |
NETtalk / 12.1: |
Input and Output Representations / 12.1.1: |
Experiments / 12.1.2: |
Backpropagation through Time / 12.1.3: |
Handwritten Character Recognition / 12.3: |
Neocognitron Architecture / 12.3.1: |
The Network / 12.3.2: |
Robot Manipulator with Excess Degrees of Freedom / 12.3.3: |
The Problem / 12.4.1: |
Training the Inverse Network / 12.4.2: |
Plan Units / 12.4.3: |
Simulated Annealing and Boltzmann Machines / 12.4.4: |
Simulated Annealing / 13.1: |
Boltzmann Machines / 13.2: |
The Boltzmann Model / 13.2.1: |
Boltzmann Learning / 13.2.2: |
The Boltzmann Algorithm and Noise Clamping / 13.2.3: |
Example: The 4-2-4 Encoder Problem / 13.2.4: |
Remarks / 13.3: |
Neural Network Expert Systems / 13.4: |
Expert Systems and Neural Networks / 14: |
Expert Systems / 14.1: |
What Is an Expert System? / 14.1.1: |
Why Expert Systems? / 14.1.2: |
Historically Important Expert Systems / 14.1.3: |
Critique of Conventional Expert Systems / 14.1.4: |
Neural Network Decision Systems / 14.2: |
Example: Diagnosis of Acute Coronary Occlusion / 14.2.1: |
Example: Autonomous Navigation / 14.2.2: |
Other Examples / 14.2.3: |
Decision Systems versus Expert Systems / 14.2.4: |
MACIE, and an Example Problem / 14.3: |
Diagnosis and Treatment of Acute Sarcophagal Disease / 14.3.1: |
Network Generation / 14.3.2: |
Sample Run of Macie / 14.3.3: |
Real-Valued Variables and Winner-Take-All Groups / 14.3.4: |
Not-Yet-Known versus Unavailable Variables / 14.3.5: |
Applicability of Neural Network Expert Systems / 14.4: |
Details of the MACIE System / 14.5: |
Inferencing and Forward Chaining / 15.1: |
Discrete Multilayer Perceptron Models / 15.1.1: |
Continuous Variables / 15.1.2: |
Winner-Take-All Groups / 15.1.3: |
Using Prior Probabilities for More Aggressive Inferencing / 15.1.4: |
Confidence Estimation / 15.2: |
A Confidence Heuristic Prior to Inference / 15.2.1: |
Confidence in Inferences / 15.2.2: |
Information Acquisition and Backward Chaining / 15.3: |
Concluding Comment / 15.4: |
Noise, Redundancy, Fault Detection, and Bayesian Decision Theory / 15.5: |
The High Tech Lemonade Corporation's Problem / 16.1: |
The Deep Model and the Noise Model / 16.2: |
Generating the Expert System / 16.3: |
Probabilistic Analysis / 16.4: |
Noisy Single-Pattern Boolean Fault Detection Problems / 16.5: |
Convergence Theorem / 16.6: |
Extracting Rules from networks / 16.7: |
Why Rules? / 17.1: |
What Kind of Rules? / 17.2: |
Criteria / 17.2.1: |
Inference Justifications versus Rule Sets / 17.2.2: |
Which Variables in Conditions / 17.2.3: |
Inference Justifications / 17.3: |
MACIE's Algorithm / 17.3.1: |
The Removal Algorithm / 17.3.2: |
Key Factor Justifications / 17.3.3: |
Justifications for Continuous Models / 17.3.4: |
Rule Sets / 17.4: |
Limiting the Number of Conditions / 17.4.1: |
Approximating Rules / 17.4.2: |
Conventional + Neural Network Expert Systems / 17.5: |
Debugging an Expert System Knowledge Base / 17.5.1: |
The Short-Rule Debugging Cycle / 17.5.2: |
Appendix Representation Comparisons / 17.6: |
DNF Expressions / A.1 DNF Expressions and Polynomial Representability: |
Polynomial Representability / A.1.2: |
Space Comparison of MLP and DNF Representations / A.1.3: |
Speed Comparison of MLP and DNF Representations / A.1.4: |
MLP versus DNF Representations / A.1.5: |
Decision Trees / A.2: |
Representing Decision Trees by MLP's / A.2.1: |
Speed Comparison / A.2.2: |
Decision Trees versus MLP's / A.2.3: |
p-lDiagrams / A.3: |
Symmetric Functions and Depth Complexity / A.4: |
Bibliography / A.5: |
Index |