Preface |
Fundamentals / Part I: |
Introduction / 1: |
Speech Processing / 1.1: |
Neural Networks / 1.2: |
Taxonomy of Neural Networks / 1.2.1: |
Overview / 1.2.2.1: |
Structure / 1.2.2.2: |
Measurement / 1.2.2.3: |
Objective Function / 1.2.2.4: |
Optimization / 1.2.2.5: |
Neural Networks for Speech Processing / 1.3: |
Handbook Overview / 1.4: |
Part I: Fundamentals / 1.4.1: |
Part II: Current Issues in Speech Recognition / 1.4.2: |
Part III: Current Issues in Speech Signal Processing / 1.4.3: |
References |
The Speech Signal and Its Production Model / 2: |
Information Conveyed by Speech / 2.1: |
Linguistic Information / 2.2.1: |
Segmental Features / 2.2.1.1: |
Suprasegmental Features / 2.2.1.2: |
Paralinguistic Information / 2.2.2: |
Nonlinguistic Information / 2.2.3: |
Idiosyncratic Factors / 2.2.3.1: |
Emotional Factors / 2.2.3.2: |
Hierarchical Speech Production Processes / 2.2.4: |
Summary / 2.2.5: |
Physical and Physiological Processes in Speech Production / 2.3: |
Respiration System / 2.3.1: |
Normal Breathing Without Speech Production / 2.3.1.1: |
Expiration in Speech Production / 2.3.1.2: |
Phonatory System / 2.3.2: |
Framework of the Larynx / 2.3.2.1: |
Abduction versus Adduction / 2.3.2.2: |
F0 Control During Speech / 2.3.2.3: |
Articulatory System / 2.3.3: |
Morphology of Articulators / 2.3.3.1: |
Vowel Production / 2.3.3.2: |
Consonant Production / 2.3.3.3: |
Models and Theories of Speech Production / 2.3.4: |
Laryngeal System / 2.4.1: |
Vocal Fold Vibration / 2.4.1.1: |
F0 Control in Running Speech Production / 2.4.1.2: |
Vertical Movements of the Larynx / 2.4.1.3: |
Dynamic Characteristic of Articulators / 2.4.2: |
Models of Individual Articulators / 2.4.2.1: |
Articulatory Models of Speech Production / 2.4.2.2: |
Conclusion / 2.4.3: |
Acknowledgment |
Speech Recognition / 3: |
Hearing and Machine Recognition / 3.1: |
Recognition-Oriented Speech Feature Representation / 3.2.2: |
Sound Spectrogram: Time-Frequency-Energy Representation / 3.2.2.1: |
Acoustic Feature Vector / 3.2.2.2: |
Static Versus Dynamic Nature / 3.2.2.3: |
Variety of Recognition Tasks / 3.2.3: |
Recognition Mechanism / 3.2.4: |
Example Task Setting / 3.2.4.1: |
Distance-Based Recognition / 3.2.4.2: |
Distance Computation Based on Dynamic Time Warping / 3.2.4.3: |
Remarks / 3.2.4.4: |
Bayes Decision Theory / 3.3: |
Maximum Likelihood Estimation Approach / 3.3.1: |
Bayesian Approach / 3.3.3: |
Discriminant Function Approach / 3.3.4: |
Example Task and Decision Rule / 3.3.4.1: |
Loss / 3.3.4.2: |
Design of Linear Discriminant Function Classifier / 3.3.4.3: |
Acoustic Feature Extraction / 3.3.4.5: |
Filter-Bank / 3.4.1: |
Artificial Cochlea Filter / 3.4.1.1: |
Fourier-Transform-Based Filter / 3.4.1.2: |
Autoregressive Modeling / 3.4.2: |
Cepstrum Modeling / 3.4.3: |
Dynamic Feature Modeling / 3.4.4: |
Probabilistic Acoustic Modeling Based on Hidden Markov Model / 3.5: |
Principles of Hidden Markov Model / 3.5.1: |
Selection of Output Probability Function / 3.5.2: |
Discrete Model / 3.5.2.1: |
Continuous Model / 3.5.2.2: |
MLE-Based Design Method / 3.5.3: |
Forward-Backward Method / 3.5.3.1: |
Trellis Algorithm and Viterbi Algorithm / 3.5.4: |
Discriminative Design Methods / 3.5.5: |
Language Modeling / 3.6: |
Role of Language Modeling / 3.6.1: |
N-Gram Language Modeling / 3.6.2: |
Concluding Remarks / 3.7: |
Selection of Model Units / 3.7.1: |
Open-Vocabulary Recognition / 3.7.3: |
Bibliographical Remarks / 3.7.4: |
Speech Coding / 4: |
Attributes of Speech Coders / 4.1: |
Basic Principles of Speech Coders / 4.3: |
Quantization / 4.4: |
Scalar Quantization / 4.4.1: |
Vector Quantization / 4.4.2: |
Linear Prediction / 4.5: |
Linear Prediction Principles / 4.5.1: |
Speech Coding Based on Linear Prediction / 4.5.2: |
The Analysis-by-Synthesis Principle / 4.5.3: |
Perceptual Filtering / 4.5.4: |
Quantization of the Linear Prediction Coefficients / 4.5.5: |
Sinusoidal Coding / 4.6: |
Waveform Interpolation Methods / 4.7: |
Subband Coding / 4.8: |
Variable-Rate Coding / 4.9: |
Basics / 4.9.1: |
Phonetic Segmentation / 4.9.2: |
Variable Rate Coders for ATM Networks / 4.9.3: |
Voice over IP / 4.9.4: |
Wideband Coders / 4.10: |
Measuring Speech Coder Performance / 4.11: |
Speech Coding over Noisy Channels / 4.12: |
Speech Coding Standards / 4.13: |
Conclusions / 4.14: |
Current Issues in Speech Recognition / Part II: |
Discriminative Prototype-Based Methods for Speech Recognition / 5: |
The Bayes Decision Rule / 5.1: |
Discriminant Functions / 5.2.2: |
Discriminant Functions for Prototype-Based Methods / 5.2.3: |
Example-Based Methods / 5.3: |
Density Estimation / 5.3.1: |
Estimation of Posterior Probabilities / 5.3.2: |
The k-Nearest-Neighbor Method / 5.3.3: |
The Nearest-Neighbor Rule / 5.3.3.1: |
Error Bounds for k-Nearest-Neighbor Classification / 5.3.3.2: |
Parzen Windows / 5.3.4: |
An Example / 5.3.4.1: |
Advantages and Limitations of Example-Based Methods / 5.3.5: |
Smoothing / 5.3.6: |
Applications to Speech Recognition / 5.3.7: |
Prototype-Based Methods for Speech Recognition / 5.4: |
Prototype-Based Classifier Design Using Minimum Classification Error / 5.5: |
Definition of Discriminant Function / 5.5.1: |
Definition of Misclassification Measure / 5.5.2: |
Definition of Local Loss Function / 5.5.3: |
Overall Loss Function and Optimization / 5.5.4: |
Modified Newton's Method: The Quickprop Algorithm / 5.5.5: |
Relation of MCE Loss to the Bayes Error / 5.5.6: |
Choice of Smoothing Parameters for MCE-Based Optimization / 5.5.7: |
Learning Vector Quantization / 5.6: |
Shift-Tolerant LVQ for Speech Recognition / 5.6.1: |
HMM Interpretation of STLVQ / 5.6.1.1: |
Limitations and Strengths of STLVQ Architecture and Training / 5.6.1.2: |
Expanding the Scope of LVQ for Speech Recognition: Incorporation into Hidden Markov Modeling / 5.6.2: |
LVQ-HMM / 5.6.2.1: |
HMM-LVQ / 5.6.2.2: |
Minimum Classification Error Interpretation of LVQ / 5.6.3: |
Smoothness of MCE Loss / 5.6.4: |
LVQ Summary / 5.6.5: |
Prototype-Based Methods Using Dynamic Programming / 5.7: |
MCE-Trained Prototypes for DTW-Based Speech Recognition / 5.7.1: |
Practical Implementation of MCE/GPD / 5.7.1.1: |
MCE-DTW Results / 5.7.1.2: |
Prototype-Based Minimum Error Classifier / 5.7.2: |
PBMEC State Distance and Discriminant Function / 5.7.2.1: |
MCE/GPD in the Context of Speech Recognition Using Phoneme Models / 5.7.2.2: |
PBMEC Results / 5.7.2.3: |
Summary of Prototype-Based Methods Using DP / 5.7.3: |
Hidden Markov Model Design Based on MCE / 5.8: |
HMM State Likelihood and Discriminant Function / 5.8.1: |
MCE Misclassification Measure and Loss / 5.8.2: |
Calculation of MCE Gradient for HMMs / 5.8.3: |
Derivative of Loss with Respect to Misclassification Measure / 5.8.3.1: |
Derivative of Misclassification Measure with Respect to Discriminant Functions / 5.8.3.2: |
Derivative of Discriminant Function with Respect to Observation Probability Density Function / 5.8.3.3: |
Derivative of Observation Probability with Respect to Mixing Weights / 5.8.3.4: |
Derivative of Observation Probability with Respect to Mean Vectors / 5.8.3.5: |
Derivative of Observation Probability with Respect to Covariances / 5.8.3.6: |
Application of the Chain Rule / 5.8.3.7: |
MCE-HMM Results / 5.8.4: |
Recurrent Neural Networks for Speech Recognition / 5.9: |
Background and Motivation / 6.1: |
Chapter Overview / 6.1.2: |
Speech Recognition Theory / 6.2: |
Basics of Neural Networks / 6.3: |
Parameter Estimation by Maximum Likelihood / 6.3.1: |
Problem Classification / 6.3.2: |
Regression / 6.3.2.1: |
Classification / 6.3.2.2: |
Neural Network Training / 6.3.3: |
Gradient Descent Training / 6.3.3.1: |
RPROP Training / 6.3.3.2: |
ARPROP Training / 6.3.3.3: |
Neural Network Architectures / 6.3.4: |
Multilayer Perceptrons / 6.3.4.1: |
Time-Delay Neural Networks / 6.3.4.2: |
Recurrent Neural Networks / 6.4: |
Unidirectional Recurrent Neural Network / 6.4.1: |
RNN Architecture / 6.4.1.1: |
RNN Training / 6.4.1.2: |
Bidirectional Recurrent Neural Network / 6.4.2: |
BRNN Architecture / 6.4.2.1: |
BRNN Training / 6.4.2.2: |
Modeling Phonetic Context / 6.5: |
System Training and Usage / 6.6: |
Training / 6.6.1: |
Usage / 6.6.2: |
Discussion / 6.7: |
Training Criterion / 6.7.1.1: |
Discriminative Training / 6.7.1.2: |
Distribution of Model Complexity / 6.7.1.3: |
Time-Delay Neural Networks and NN/HMM Hybrids: A Family of Connectionist Continuous-Speech Recognition Systems / 6.7.2: |
MS-TDNNs and NN/HMM Hybrid Approaches / 7.1: |
The Time-Delay Neural Network (TDNN) / 7.2.1: |
Multistate TDNN / 7.2.2: |
MS-TDNN Variants / 7.2.3: |
Hybrid NN/HMM Variants / 7.2.4: |
Alphabet Recognition with the MS-TDNN / 7.3: |
Training Procedures / 7.3.1: |
Duration Modeling / 7.3.2: |
Experiments / 7.3.3: |
Speaker-Dependent Data / 7.3.3.1: |
Speaker-Independent Data / 7.3.3.2: |
Telephone Data / 7.3.3.3: |
Searching in Large Name Lists / 7.3.4: |
Multimodal Input: Lipreading / 7.4: |
Motivation / 7.4.1: |
The Recognizer / 7.4.2: |
Results / 7.4.3: |
Modular Neural Networks / 7.5: |
Architecture / 7.5.1: |
Application to NN/HMM Models / 7.5.2: |
Experiments with a Hybrid HME/HMM System / 7.5.3: |
Context Modeling / 7.6: |
Clustering Context Classes / 7.6.1: |
Factoring Context-Dependent Posteriors / 7.6.2: |
Hierarchies of Neural Networks / 7.6.3: |
Manually Structured Hierarchies / 7.6.3.1: |
Clustering Hierarchies of Neural Networks / 7.6.3.2: |
Experiments and Results / 7.6.4: |
Probability-Oriented Neural Networks and Hybrid Connectionist/Stochastic Networks / 8: |
Fundamentals of Probability-Oriented Neural Networks / 8.1: |
The Bayes Decision Framework / 8.2.1: |
Types of PONNs / 8.2.2: |
Radial Basis Function Networks / 8.2.3.1: |
Probabilistic Neural Networks / 8.2.3.2: |
Learning Methods for PNNs / 8.2.4: |
Position of the Problem / 8.3.1: |
MLE and EM Algorithms for PNNs / 8.3.2: |
MMIE for PNNs / 8.3.3: |
Applications to Automatic Speech Recognition / 8.4: |
Speaker Recognition / 8.4.1: |
Hybrid Connectionist/Stochastic Models / 8.5: |
Proposed Solutions / 8.5.1: |
ANNs as Front-Ends for HMMs / 8.5.2.1: |
ANNs as Postprocessors of HMMs / 8.5.2.2: |
Unified Models / 8.5.2.3: |
Minimum Classification Error Networks / 8.5.3: |
Speech Pattern Recognition Using Modular Systems / 9.1: |
Classifier Design / 9.1.2: |
What Is an Artificial Neural Network? / 9.1.3: |
Minimum Recognition Error Network / 9.1.4: |
Chapter Organization / 9.1.5: |
Discriminative Pattern Classification / 9.2: |
Minimum Error Rate Classification / 9.2.1: |
Generalized Probabilistic Descent Method / 9.2.3: |
Formalization Fundamentals / 9.3.1: |
Distance Classifier for Classifying Dynamic Patterns: Preparation / 9.3.2.1: |
Emulation of Decision Process / 9.3.2.2: |
Selection of Loss Functions / 9.3.2.3: |
Design Optimality in Practical Situations / 9.3.2.4: |
GPD-Based Classifier Design / 9.3.3: |
E-Set Task / 9.3.3.1: |
P-Set Task / 9.3.3.2: |
Derivatives of GPD / 9.4: |
Segmental GPD for Continuous Speech Recognition / 9.4.1: |
Minimum Error Training for Open-Vocabulary Speech Recognition / 9.4.3: |
Open-Vocabulary Speech Recognition / 9.4.3.1: |
Minimum Spotting Error Learning / 9.4.3.2: |
Discriminative Utterance Verification / 9.4.3.3: |
Discriminative Feature Extraction / 9.4.4: |
An Example Implementation for Cepstrum-Based Speech Recognition / 9.4.4.1: |
Discriminative Metric Design / 9.4.4.3: |
Minimum Error Learning Subspace Method / 9.4.4.4: |
Speaker Recognition Using GPD / 9.4.5: |
Acknowledgments / 9.5: |
Probabilistic Descent Theorem for Probability-Based Discriminant Functions / Appendix 1: |
Relationships Between MCE/GPD and Others / Appendix 2: |
Current Issues in Speech Signal Processing / Part III: |
Networks for Speaker Recognition / 10: |
Speaker Recognition Overview / 10.1: |
Discriminative Information / 10.3: |
Supervised Training / 10.3.1: |
Cohort Normalization / 10.3.2: |
Speaker Recognition Networks / 10.4: |
Multilayer Perceptron / 10.4.1: |
Radial Basis Functions / 10.4.2: |
Decision Trees / 10.4.3: |
Neural Tree Network / 10.4.7: |
Performance Summary / 10.4.8: |
Model Combination / 10.5: |
Model Combination Approaches / 10.5.1: |
Linear Opinion Pool / 10.5.1.1: |
Log Opinion Pool / 10.5.1.2: |
Voting Methods / 10.5.1.3: |
Error Correlation Analysis / 10.5.2: |
Two-Model Combination / 10.5.3: |
Three-Model Combination / 10.5.4: |
Neural Networks for Voice Conversion / 10.6: |
Introduction: Speech and Speaker Characteristics / 11.1: |
Studies in Voice Conversion / 11.2: |
Neural Networks for Transformation of Vocal Tract Shapes / 11.3: |
Linear Approximation of Formant Transformation / 11.3.1: |
Neural Network Models / 11.3.2: |
Generalization / 11.3.3: |
Implementation of Voice Conversion / 11.4: |
Voice Transformation System / 11.4.1: |
Normalization of Intonational Features / 11.4.2: |
Evaluation of Voice Transformation / 11.4.3: |
Neural Networks for Speech Coding / 11.5: |
Source Coding and Neural Networks / 12.1: |
Source Coding / 12.2.1: |
Source Coding with Neural Networks / 12.2.2: |
Vector Quantization with Kohonen Self-Organizing Feature Maps / 12.2.3.1: |
Multilayer Neural Network as Front-End of a Coder / 12.2.3.2: |
Codebook-Excited Neural Networks / 12.2.3.3: |
Quantization Performance of Neural Networks / 12.3: |
Kohonen Self-Organizing Feature Maps / 12.3.1: |
Architecture and Training Process / 12.3.1.1: |
Conditional Histogram Neural Network FSVQ / 12.3.1.2: |
Nearest-Neighbor Neural Network FSVQ / 12.3.1.3: |
Simulations / 12.3.1.4: |
Coders with Neural Network Front-Ends / 12.3.2: |
Speech Coding with Neural Networks / 12.3.3: |
Coding Speech Spectrum with Neural Networks / 12.4.1: |
Nonlinear Prediction Speech Coding / 12.4.2: |
A Neural Model of Nonlinear Prediction / 12.4.2.1: |
Nonlinear Predictive Vector Quantization / 12.4.2.2: |
Nonlinear Predictive Quantization Performance / 12.4.2.3: |
Code-Excited Nonlinear Predictive Speech Coding / 12.4.3: |
Nonlinear Predictive Filter Tolerance for an Excitation Disturbance / 12.4.3.1: |
Gain-Adaptive Nonlinear Predictive Coding / 12.4.3.2: |
Coding Performance / 12.4.3.3: |
Networks for Speech Enhancement / 12.4.4: |
Background / 13.1: |
Model Structure / 13.1.2: |
Neural Time-Domain Filtering Methods / 13.1.3: |
Direct Time-Domain Mapping / 13.2.1: |
Extended Kalman Filtering with Predictive Models / 13.2.2: |
Neural Transform-Domain Methods / 13.3: |
Spectral Subtraction / 13.3.1: |
Neural Transform-Domain Mappings / 13.3.2: |
State-Dependent Model Switching Methods / 13.4: |
Classification Switched Models / 13.4.1: |
Hybrid HMM and EKF / 13.4.2: |
Online Iterative Methods / 13.5: |
Online Predictive Enhancement / 13.5.1: |
Maximum-Likelihood Estimation and Dual Kalman Filtering / 13.5.2: |
Noise-Regularized Adaptive Filtering / 13.5.3: |
Summary and Conclusions / 13.6: |
Index |