Handbook of neural networks for speech processing

>> Google Books

所蔵情報QRコード

Handbook of neural networks for speech processing / Shigeru Katagiri, editor

資料種別:: 図書
出版情報:: Boston : Artech House, c2000
形態:: xxiii, 522 p. ; 24 cm
著者名:: Katagiri, Shigeru
ISBN:: 9780890069547 [0890069549]
書誌ID:: BA49054970

子書誌情報

フルテキスト

所蔵情報

他の版・巻

書誌詳細

注記:

Includes bibliographical references and index
"Signal processing"--Back cover
"Artech House signal processing library"--T.p. verso on CIP

主題:

Neural networks(Computer science); Speech perception; Speech processing

言語:

英語

目次情報:

Preface

Fundamentals / Part I：

Introduction / 1：

Speech Processing / 1.1：

Neural Networks / 1.2：

Taxonomy of Neural Networks / 1.2.1：

Overview / 1.2.2.1：

Structure / 1.2.2.2：

Measurement / 1.2.2.3：

Objective Function / 1.2.2.4：

Optimization / 1.2.2.5：

Neural Networks for Speech Processing / 1.3：

Handbook Overview / 1.4：

Part I: Fundamentals / 1.4.1：

Part II: Current Issues in Speech Recognition / 1.4.2：

Part III: Current Issues in Speech Signal Processing / 1.4.3：

References

The Speech Signal and Its Production Model / 2：

Information Conveyed by Speech / 2.1：

Linguistic Information / 2.2.1：

Segmental Features / 2.2.1.1：

Suprasegmental Features / 2.2.1.2：

Paralinguistic Information / 2.2.2：

Nonlinguistic Information / 2.2.3：

Idiosyncratic Factors / 2.2.3.1：

Emotional Factors / 2.2.3.2：

Hierarchical Speech Production Processes / 2.2.4：

Summary / 2.2.5：

Physical and Physiological Processes in Speech Production / 2.3：

Respiration System / 2.3.1：

Normal Breathing Without Speech Production / 2.3.1.1：

Expiration in Speech Production / 2.3.1.2：

Phonatory System / 2.3.2：

Framework of the Larynx / 2.3.2.1：

Abduction versus Adduction / 2.3.2.2：

F0 Control During Speech / 2.3.2.3：

Articulatory System / 2.3.3：

Morphology of Articulators / 2.3.3.1：

Vowel Production / 2.3.3.2：

Consonant Production / 2.3.3.3：

Models and Theories of Speech Production / 2.3.4：

Laryngeal System / 2.4.1：

Vocal Fold Vibration / 2.4.1.1：

F0 Control in Running Speech Production / 2.4.1.2：

Vertical Movements of the Larynx / 2.4.1.3：

Dynamic Characteristic of Articulators / 2.4.2：

Models of Individual Articulators / 2.4.2.1：

Articulatory Models of Speech Production / 2.4.2.2：

Conclusion / 2.4.3：

Acknowledgment

Speech Recognition / 3：

Hearing and Machine Recognition / 3.1：

Recognition-Oriented Speech Feature Representation / 3.2.2：

Sound Spectrogram: Time-Frequency-Energy Representation / 3.2.2.1：

Acoustic Feature Vector / 3.2.2.2：

Static Versus Dynamic Nature / 3.2.2.3：

Variety of Recognition Tasks / 3.2.3：

Recognition Mechanism / 3.2.4：

Example Task Setting / 3.2.4.1：

Distance-Based Recognition / 3.2.4.2：

Distance Computation Based on Dynamic Time Warping / 3.2.4.3：

Remarks / 3.2.4.4：

Bayes Decision Theory / 3.3：

Maximum Likelihood Estimation Approach / 3.3.1：

Bayesian Approach / 3.3.3：

Discriminant Function Approach / 3.3.4：

Example Task and Decision Rule / 3.3.4.1：

Loss / 3.3.4.2：

Design of Linear Discriminant Function Classifier / 3.3.4.3：

Acoustic Feature Extraction / 3.3.4.5：

Filter-Bank / 3.4.1：

Artificial Cochlea Filter / 3.4.1.1：

Fourier-Transform-Based Filter / 3.4.1.2：

Autoregressive Modeling / 3.4.2：

Cepstrum Modeling / 3.4.3：

Dynamic Feature Modeling / 3.4.4：

Probabilistic Acoustic Modeling Based on Hidden Markov Model / 3.5：

Principles of Hidden Markov Model / 3.5.1：

Selection of Output Probability Function / 3.5.2：

Discrete Model / 3.5.2.1：

Continuous Model / 3.5.2.2：

MLE-Based Design Method / 3.5.3：

Forward-Backward Method / 3.5.3.1：

Trellis Algorithm and Viterbi Algorithm / 3.5.4：

Discriminative Design Methods / 3.5.5：

Language Modeling / 3.6：

Role of Language Modeling / 3.6.1：

N-Gram Language Modeling / 3.6.2：

Concluding Remarks / 3.7：

Selection of Model Units / 3.7.1：

Open-Vocabulary Recognition / 3.7.3：

Bibliographical Remarks / 3.7.4：

Speech Coding / 4：

Attributes of Speech Coders / 4.1：

Basic Principles of Speech Coders / 4.3：

Quantization / 4.4：

Scalar Quantization / 4.4.1：

Vector Quantization / 4.4.2：

Linear Prediction / 4.5：

Linear Prediction Principles / 4.5.1：

Speech Coding Based on Linear Prediction / 4.5.2：

The Analysis-by-Synthesis Principle / 4.5.3：

Perceptual Filtering / 4.5.4：

Quantization of the Linear Prediction Coefficients / 4.5.5：

Sinusoidal Coding / 4.6：

Waveform Interpolation Methods / 4.7：

Subband Coding / 4.8：

Variable-Rate Coding / 4.9：

Basics / 4.9.1：

Phonetic Segmentation / 4.9.2：

Variable Rate Coders for ATM Networks / 4.9.3：

Voice over IP / 4.9.4：

Wideband Coders / 4.10：

Measuring Speech Coder Performance / 4.11：

Speech Coding over Noisy Channels / 4.12：

Speech Coding Standards / 4.13：

Conclusions / 4.14：

Current Issues in Speech Recognition / Part II：

Discriminative Prototype-Based Methods for Speech Recognition / 5：

The Bayes Decision Rule / 5.1：

Discriminant Functions / 5.2.2：

Discriminant Functions for Prototype-Based Methods / 5.2.3：

Example-Based Methods / 5.3：

Density Estimation / 5.3.1：

Estimation of Posterior Probabilities / 5.3.2：

The k-Nearest-Neighbor Method / 5.3.3：

The Nearest-Neighbor Rule / 5.3.3.1：

Error Bounds for k-Nearest-Neighbor Classification / 5.3.3.2：

Parzen Windows / 5.3.4：

An Example / 5.3.4.1：

Advantages and Limitations of Example-Based Methods / 5.3.5：

Smoothing / 5.3.6：

Applications to Speech Recognition / 5.3.7：

Prototype-Based Methods for Speech Recognition / 5.4：

Prototype-Based Classifier Design Using Minimum Classification Error / 5.5：

Definition of Discriminant Function / 5.5.1：

Definition of Misclassification Measure / 5.5.2：

Definition of Local Loss Function / 5.5.3：

Overall Loss Function and Optimization / 5.5.4：

Modified Newton's Method: The Quickprop Algorithm / 5.5.5：

Relation of MCE Loss to the Bayes Error / 5.5.6：

Choice of Smoothing Parameters for MCE-Based Optimization / 5.5.7：

Learning Vector Quantization / 5.6：

Shift-Tolerant LVQ for Speech Recognition / 5.6.1：

HMM Interpretation of STLVQ / 5.6.1.1：

Limitations and Strengths of STLVQ Architecture and Training / 5.6.1.2：

Expanding the Scope of LVQ for Speech Recognition: Incorporation into Hidden Markov Modeling / 5.6.2：

LVQ-HMM / 5.6.2.1：

HMM-LVQ / 5.6.2.2：

Minimum Classification Error Interpretation of LVQ / 5.6.3：

Smoothness of MCE Loss / 5.6.4：

LVQ Summary / 5.6.5：

Prototype-Based Methods Using Dynamic Programming / 5.7：

MCE-Trained Prototypes for DTW-Based Speech Recognition / 5.7.1：

Practical Implementation of MCE/GPD / 5.7.1.1：

MCE-DTW Results / 5.7.1.2：

Prototype-Based Minimum Error Classifier / 5.7.2：

PBMEC State Distance and Discriminant Function / 5.7.2.1：

MCE/GPD in the Context of Speech Recognition Using Phoneme Models / 5.7.2.2：

PBMEC Results / 5.7.2.3：

Summary of Prototype-Based Methods Using DP / 5.7.3：

Hidden Markov Model Design Based on MCE / 5.8：

HMM State Likelihood and Discriminant Function / 5.8.1：

MCE Misclassification Measure and Loss / 5.8.2：

Calculation of MCE Gradient for HMMs / 5.8.3：

Derivative of Loss with Respect to Misclassification Measure / 5.8.3.1：

Derivative of Misclassification Measure with Respect to Discriminant Functions / 5.8.3.2：

Derivative of Discriminant Function with Respect to Observation Probability Density Function / 5.8.3.3：

Derivative of Observation Probability with Respect to Mixing Weights / 5.8.3.4：

Derivative of Observation Probability with Respect to Mean Vectors / 5.8.3.5：

Derivative of Observation Probability with Respect to Covariances / 5.8.3.6：

Application of the Chain Rule / 5.8.3.7：

MCE-HMM Results / 5.8.4：

Recurrent Neural Networks for Speech Recognition / 5.9：

Background and Motivation / 6.1：

Chapter Overview / 6.1.2：

Speech Recognition Theory / 6.2：

Basics of Neural Networks / 6.3：

Parameter Estimation by Maximum Likelihood / 6.3.1：

Problem Classification / 6.3.2：

Regression / 6.3.2.1：

Classification / 6.3.2.2：

Neural Network Training / 6.3.3：

Gradient Descent Training / 6.3.3.1：

RPROP Training / 6.3.3.2：

ARPROP Training / 6.3.3.3：

Neural Network Architectures / 6.3.4：

Multilayer Perceptrons / 6.3.4.1：

Time-Delay Neural Networks / 6.3.4.2：

Recurrent Neural Networks / 6.4：

Unidirectional Recurrent Neural Network / 6.4.1：

RNN Architecture / 6.4.1.1：

RNN Training / 6.4.1.2：

Bidirectional Recurrent Neural Network / 6.4.2：

BRNN Architecture / 6.4.2.1：

BRNN Training / 6.4.2.2：

Modeling Phonetic Context / 6.5：

System Training and Usage / 6.6：

Training / 6.6.1：

Usage / 6.6.2：

Discussion / 6.7：

Training Criterion / 6.7.1.1：

Discriminative Training / 6.7.1.2：

Distribution of Model Complexity / 6.7.1.3：

Time-Delay Neural Networks and NN/HMM Hybrids: A Family of Connectionist Continuous-Speech Recognition Systems / 6.7.2：

MS-TDNNs and NN/HMM Hybrid Approaches / 7.1：

The Time-Delay Neural Network (TDNN) / 7.2.1：

Multistate TDNN / 7.2.2：

MS-TDNN Variants / 7.2.3：

Hybrid NN/HMM Variants / 7.2.4：

Alphabet Recognition with the MS-TDNN / 7.3：

Training Procedures / 7.3.1：

Duration Modeling / 7.3.2：

Experiments / 7.3.3：

Speaker-Dependent Data / 7.3.3.1：

Speaker-Independent Data / 7.3.3.2：

Telephone Data / 7.3.3.3：

Searching in Large Name Lists / 7.3.4：

Multimodal Input: Lipreading / 7.4：

Motivation / 7.4.1：

The Recognizer / 7.4.2：

Results / 7.4.3：

Modular Neural Networks / 7.5：

Architecture / 7.5.1：

Application to NN/HMM Models / 7.5.2：

Experiments with a Hybrid HME/HMM System / 7.5.3：

Context Modeling / 7.6：

Clustering Context Classes / 7.6.1：

Factoring Context-Dependent Posteriors / 7.6.2：

Hierarchies of Neural Networks / 7.6.3：

Manually Structured Hierarchies / 7.6.3.1：

Clustering Hierarchies of Neural Networks / 7.6.3.2：

Experiments and Results / 7.6.4：

Probability-Oriented Neural Networks and Hybrid Connectionist/Stochastic Networks / 8：

Fundamentals of Probability-Oriented Neural Networks / 8.1：

The Bayes Decision Framework / 8.2.1：

Types of PONNs / 8.2.2：

Radial Basis Function Networks / 8.2.3.1：

Probabilistic Neural Networks / 8.2.3.2：

Learning Methods for PNNs / 8.2.4：

Position of the Problem / 8.3.1：

MLE and EM Algorithms for PNNs / 8.3.2：

MMIE for PNNs / 8.3.3：

Applications to Automatic Speech Recognition / 8.4：

Speaker Recognition / 8.4.1：

Hybrid Connectionist/Stochastic Models / 8.5：

Proposed Solutions / 8.5.1：

ANNs as Front-Ends for HMMs / 8.5.2.1：

ANNs as Postprocessors of HMMs / 8.5.2.2：

Unified Models / 8.5.2.3：

Minimum Classification Error Networks / 8.5.3：

Speech Pattern Recognition Using Modular Systems / 9.1：

Classifier Design / 9.1.2：

What Is an Artificial Neural Network? / 9.1.3：

Minimum Recognition Error Network / 9.1.4：

Chapter Organization / 9.1.5：

Discriminative Pattern Classification / 9.2：

Minimum Error Rate Classification / 9.2.1：

Generalized Probabilistic Descent Method / 9.2.3：

Formalization Fundamentals / 9.3.1：

Distance Classifier for Classifying Dynamic Patterns: Preparation / 9.3.2.1：

Emulation of Decision Process / 9.3.2.2：

Selection of Loss Functions / 9.3.2.3：

Design Optimality in Practical Situations / 9.3.2.4：

GPD-Based Classifier Design / 9.3.3：

E-Set Task / 9.3.3.1：

P-Set Task / 9.3.3.2：

Derivatives of GPD / 9.4：

Segmental GPD for Continuous Speech Recognition / 9.4.1：

Minimum Error Training for Open-Vocabulary Speech Recognition / 9.4.3：

Open-Vocabulary Speech Recognition / 9.4.3.1：

Minimum Spotting Error Learning / 9.4.3.2：

Discriminative Utterance Verification / 9.4.3.3：

Discriminative Feature Extraction / 9.4.4：

An Example Implementation for Cepstrum-Based Speech Recognition / 9.4.4.1：

Discriminative Metric Design / 9.4.4.3：

Minimum Error Learning Subspace Method / 9.4.4.4：

Speaker Recognition Using GPD / 9.4.5：

Acknowledgments / 9.5：

Probabilistic Descent Theorem for Probability-Based Discriminant Functions / Appendix 1：

Relationships Between MCE/GPD and Others / Appendix 2：

Current Issues in Speech Signal Processing / Part III：

Networks for Speaker Recognition / 10：

Speaker Recognition Overview / 10.1：

Discriminative Information / 10.3：

Supervised Training / 10.3.1：

Cohort Normalization / 10.3.2：

Speaker Recognition Networks / 10.4：

Multilayer Perceptron / 10.4.1：

Radial Basis Functions / 10.4.2：

Decision Trees / 10.4.3：

Neural Tree Network / 10.4.7：

Performance Summary / 10.4.8：

Model Combination / 10.5：

Model Combination Approaches / 10.5.1：

Linear Opinion Pool / 10.5.1.1：

Log Opinion Pool / 10.5.1.2：

Voting Methods / 10.5.1.3：

Error Correlation Analysis / 10.5.2：

Two-Model Combination / 10.5.3：

Three-Model Combination / 10.5.4：

Neural Networks for Voice Conversion / 10.6：

Introduction: Speech and Speaker Characteristics / 11.1：

Studies in Voice Conversion / 11.2：

Neural Networks for Transformation of Vocal Tract Shapes / 11.3：

Linear Approximation of Formant Transformation / 11.3.1：

Neural Network Models / 11.3.2：

Generalization / 11.3.3：

Implementation of Voice Conversion / 11.4：

Voice Transformation System / 11.4.1：

Normalization of Intonational Features / 11.4.2：

Evaluation of Voice Transformation / 11.4.3：

Neural Networks for Speech Coding / 11.5：

Source Coding and Neural Networks / 12.1：

Source Coding / 12.2.1：

Source Coding with Neural Networks / 12.2.2：

Vector Quantization with Kohonen Self-Organizing Feature Maps / 12.2.3.1：

Multilayer Neural Network as Front-End of a Coder / 12.2.3.2：

Codebook-Excited Neural Networks / 12.2.3.3：

Quantization Performance of Neural Networks / 12.3：

Kohonen Self-Organizing Feature Maps / 12.3.1：

Architecture and Training Process / 12.3.1.1：

Conditional Histogram Neural Network FSVQ / 12.3.1.2：

Nearest-Neighbor Neural Network FSVQ / 12.3.1.3：

Simulations / 12.3.1.4：

Coders with Neural Network Front-Ends / 12.3.2：

Speech Coding with Neural Networks / 12.3.3：

Coding Speech Spectrum with Neural Networks / 12.4.1：

Nonlinear Prediction Speech Coding / 12.4.2：

A Neural Model of Nonlinear Prediction / 12.4.2.1：

Nonlinear Predictive Vector Quantization / 12.4.2.2：

Nonlinear Predictive Quantization Performance / 12.4.2.3：

Code-Excited Nonlinear Predictive Speech Coding / 12.4.3：

Nonlinear Predictive Filter Tolerance for an Excitation Disturbance / 12.4.3.1：

Gain-Adaptive Nonlinear Predictive Coding / 12.4.3.2：

Coding Performance / 12.4.3.3：

Networks for Speech Enhancement / 12.4.4：

Background / 13.1：

Model Structure / 13.1.2：

Neural Time-Domain Filtering Methods / 13.1.3：

Direct Time-Domain Mapping / 13.2.1：

Extended Kalman Filtering with Predictive Models / 13.2.2：

Neural Transform-Domain Methods / 13.3：

Spectral Subtraction / 13.3.1：

Neural Transform-Domain Mappings / 13.3.2：

State-Dependent Model Switching Methods / 13.4：

Classification Switched Models / 13.4.1：

Hybrid HMM and EKF / 13.4.2：

Online Iterative Methods / 13.5：

Online Predictive Enhancement / 13.5.1：

Maximum-Likelihood Estimation and Dual Kalman Filtering / 13.5.2：

Noise-Regularized Adaptive Filtering / 13.5.3：

Summary and Conclusions / 13.6：

Index

Preface

Fundamentals / Part I：

Introduction / 1：

Speech Processing / 1.1：

Neural Networks / 1.2：

Taxonomy of Neural Networks / 1.2.1：

続きを見る

東工大ブックレビュー

類似資料:

1 図書 Applied neural networks for signal processing Luo, Fa-Long, Unbehauen, Rolf Cambridge University Press	7 電子ジャーナル Speech, Image Processing and Neural Networks, International Symposium on IEEE Electronic Library (IEL) Conference Series, IEEE
2 図書 Talker variability in speech processing Johnson, Keith, 1958-, Mullennix, John W. Academic Press	8 図書 Speech processing 2, audio, underwater acoustics, VLSI & neural networks (: softbound ; : casebound) IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE Signal Processing Society IEEE Service Center
3 図書 Speech, language, and communication Miller, Joanne L., Eimas, Peter D. Academic Press	9 図書 Springer handbook of speech processing Benesty, Jacob, Sondhi, M. Mohan, 1933-, Huang, Yiteng, 1972- Springer
4 図書 Perceiving talking faces : from speech perception to a behavioral principle (: hardcover) Massaro, Dominic William MIT Press	10 図書 Neural networks for speech and sequence recognition Bengio, Yoshua International Thomson Computer Press
5 図書 Neural networks and speech processing Morgan, David P., 1961-, Scofield, Christopher L., 1957- Kluwer Academic Publishers	11 図書 Artificial neural networks for speech and vision Mammone, Richard J. Chapman & Hall
6 図書 Speech Processing, neural networks for signal processing (: softbound) IEEE International Conference on Acoustics, Speech, and Signal Processing IEEE	12 図書 Handbook on neural information processing Bianchini, Monica, Maggini, Marco, Jain, L. C. Springer