Series Introduction (K.J. Ray Liu) |
Preface to the Second Edition |
Acknowledgments |
Preface to the First Edition |
1. INTRODUCTION 1 |
2. PRINCIPAL CHARACTERISTICS OF SPEECH 5 |
2.1 Linguistic Information 5 |
2.2 Speech and Hearing 7 |
2.3 Speech Production Mechanism 9 |
2.4 Acoustic Characteristics of Speech 14 |
2.5 Statistical Characteristics of Speech 20 |
2.5.1 Distribution of amplitude level 20 |
2.5.2 Long-time averaged spectrum 23 |
2.5.3 Variation in fundamental frecuency 24 |
2.5.4 Speech ratio 26 |
3. SPEECH PRODUCTION MODELS 27 |
3.1 Acoustical Theory of Speech Production 27 |
3.2 Linear Separable Equivalent Circuit Model 30 |
3.3 Vocal Tract Transmission Model 32 |
3.3.1 Progressing wave model 32 |
3.3.2 Resonance model 38 |
3.4 Vocal Cord Model 40 |
4. SPEECH ANALYSIS AND ANALYSIS-SYNTHESIS SYSTEMS 45 |
4.1 Digitization 45 |
4.1.1 Sampling 46 |
4.1.2 Quantization and coding 47 |
4.1.3 A/D and D/A conversion 51 |
4.2 Spectral Analysis 52 |
4.2.1 Spectral structure of speech 52 |
4.2.2 Autocorrelation and Fourier transform 53 |
4.2.3 Window function 57 |
4.2.4 Sound spectrogram 60 |
4.3 Cepstrum 62 |
4.3.1 Cepstrum and its application 62 |
4.3.2 Homomorphic analysis and LPC cepstrum 66 |
4.4 Filter Bank and Zero-Crossing Analysis 70 |
4.4.1 Digital filter bank 70 |
4.4.2 Zero-crossing analysis 70 |
4.5 Analysis-by-Synthesis 71 |
4.6 Analysis-Synthesis Systems 73 |
4.6.1 Analysis-synthesis system structure 73 |
4.6.2 Example of analysis-synthesis systems 73 |
4.7 Pitch Extraction 78 |
5. LINEAR PREDICTIVE CODING (LPC) ANALYSIS 83 |
5.1 Principles of LPC Analysis 83 |
5.2 LPC Analysis Procedure 86 |
5.3 Maximum Likelihood Spectral Estimation 89 |
5.3.1 Formulation of maximum likelihood spectral estimation 89 |
5.3.2 Physical meaning of maximum likelihood spectral estimation 93 |
5.4 Source Parameter Estimation from Residual Signals 98 |
5.5 Speech Analysis-Synthesis System by LPC 99 |
5.6 PARCOR Analysis 102 |
5.6.1 Formulation of PARCOR analysis 102 |
5.6.2 Relationship between PARCOR and LPC coefficients 108 |
5.6.3 PARCOR synthesis filter 109 |
5.6.4 Vocal tract area estimation based on PARCOR analysis 110 |
5.7 Line Spectrum Pair (LSP) Analysis 116 |
5.7.1 Principle of LSP analysis 116 |
5.7.2 Solution of LSP analysis 119 |
5.7.3 LSP synthesis filter 122 |
5.7.4 Coding of LSP parameters 126 |
5.7.5 Composite sinusoidal model 126 |
5.7.6 Mutual relationships between LPC parameters 127 |
5.8 Pole-Zero Analysis 129 |
6 SPEECH CODING 133 |
6.1 Principal Techniques for Speech Coding 133 |
6.1.1 Reversible coding 133 |
6.1.2 Irreversible coding and information rate distortion theory 134 |
6.1.3 Waveform coding and analysis-synthesis systems 135 |
6.1.4 Basic techniques for waveform coding methods 138 |
6.2 Coding in Time Domain 141 |
6.2.1 Pulse code modulation (PCM) 141 |
6.2.2 Adaptive quantization 143 |
6.2.3 Predictive coding 143 |
6.2.4 Delta modulation 149 |
6.2.5 Adaptive differential PCM (ADPCM) 151 |
6.2.6 Adaptive predictive coding (APC) 153 |
6.2.7 Noise shaping 156 |
6.3 Coding in Frequency Domain 159 |
6.3.1 Subband coding (SBC) 159 |
6.3.2 Adaptive transform coding (ATC) 163 |
6.3.3 APC with adaptive bit allocation (APC-AB) 166 |
6.3.4 Time-domain harmonic scaling (TDHS) algorithm 168 |
6.4 Vector Quantization 173 |
6.4.1 Multipath search coding 173 |
6.4.2 Principles of vector quantization 175 |
6.4.3 Tree search and multistage processing 178 |
6.4.4 Vector quantization for linear predictor parameters 180 |
6.4.5 Matrix quantization and finite-state vector quantization 182 |
6.5 Hybrid Coding 187 |
6.5.1 Residual- or speech-excited linear predictive coding 187 |
6.5.2 Multipulse-excited linear predictive coding (MPC) 189 |
6.5.3 Code-excited linear predictive coding (CELP) 193 |
6.5.4 Coding by phase equalization and variable-rate tree coding 196 |
6.6 Evaluation and Standardization of Coding Methods 199 |
6.6.1 Evaluation factors of speech coding systems 199 |
6.6.2 Speech coding standards 203 |
6.7 Robust and Flexible Speech Coding 211 |
7 SPEECH SYNTHESIS 213 |
7.1 Principles of Speech Synthesis 213 |
7.2 Synthesis Based on Waveform Coding 217 |
7.3 Synthesis Based on Analysis-Synthesis Method 221 |
7.4 Synthesis Based on Speech Production Mechanism 222 |
7.4.1 Vocal tract analog method 223 |
7.4.2 Terminal analog method 224 |
7.5 Synthesis by Rule 226 |
7.5.1 Principles of synthesis by rule 226 |
7.5.2 Control of prosodic features 230 |
7.6 Text-to-Speech Conversion 234 |
7.7 Corpus-Based Speech Synthesis 237 |
8 SPEECH RECOGNITION |
8.1 Principles of Speech Recognition 243 |
8.1.1 Advantages of speech recognition 243 |
8.1.2 Difficulties in speech recognition 245 |
8.1.3 Classification of speech recognition 246 |
8.2 Speech Period Detection 248 |
8.3 Spectral Distance Measures 249 |
8.3.1 Distance measures used in speech recognition 249 |
8.3.2 Distances based on nonparametric spectral analysis 251 |
8.3.3 Distances based on LPC 252 |
8.3.4 Peak-weighted distances based on LPC analysis 258 |
8.3.5 Weighted cepstral distance 260 |
8.3.6 Transitional cepstral distance 262 |
8.3.7 Prosody 264 |
8.4 Structure of Word Recognition Systems 264 |
8.5 Dynamic Time Warping (DTW) 266 |
8.5.1 DP matching 266 |
8.5.2 Variations in DP matching 270 |
8.5.3 Staggered array DP matching 272 |
8.6 Word Recognition Using Phoneme Units 275 |
8.6.1 Principal structure 275 |
8.6.2 SPLIT method 277 |
8.7 Theory and Implementation of HMM 278 |
8.7.1 Fundamentals of HMM 278 |
8.7.2 Three basic problems for HMMs 282 |
8.7.3 Solution to Problem 1-Probability evaluation 283 |
8.7.4 Solution to Problem 2-optimal state sequence 286 |
8.7.5 Solution to Problem 3ーparameter estimation 288 |
8.7.6 Continuous observation densities in HMMs 290 |
8.7.7 Tied-mixture HMM 292 |
8.7.8 MMI and MCE/GPD training of HMM 292 |
8.7.9 HMM system for word recognition 293 |
8.8 Connected Word Recognition 295 |
8.8.1 Two-level DP matching and its modifications 295 |
8.8.2 Word spotting 303 |
8.9 Large-Vocabulary Continuous-Speech Recognition 306 |
8.9.1 Three principal structural models 306 |
8.9.2 Other system constructing factors 308 |
8.9.3 Statistical theory of continuous-speech recognition 311 |
8.9.4 Statistical language modeling 312 |
8.9.5 Typical structure of large-vocabulary continuous-speech recognition 314 |
systems 318 |
8.9.6 Methods for evaluating recognition systems 320 |
8.10 Examples of Large-Vocabulary Continuous-Speech Recognition Systems 323 |
8.10.1 DARPA speech recognition projects 323 |
8.10.2 English speech recognition system at LIMSI Laboratory 324 |
8.10.3 English speech recognition system at IBM Laboratory 325 |
8.10.4 A Japanese speech recognition system 328 |
8.11 Speaker-Independent and Adaptive Recognition 330 |
8.11.1 Multi-template method 332 |
8.11.2 Statistical method 333 |
8.11.3 Speaker normalization method 334 |
8.11.4 Speaker adaptation methods 335 |
8.11.5 Unsupervised speaker adaptation method 336 |
8.12 Robust Algorithms Against Noise and Channel Variations 339 |
8.12.1 HMM composition/PMC 344 |
8.12.2 Detection-based approach for spontaneous speech recognition 344 |
9 SPEAKER RECOGNIT ION 349 |
9.1 Principles of Speaker Recognition 349 |
9.1.1 Human and computer speaker recognition 349 |
9.1.2 Individual characteristics 351 |
9.2 Speaker Recognition Methods 352 |
9.2.1 Classification of speaker recognition methods 352 |
9.2.2 Structure of speaker recognition systems 354 |
9.2.3. Relationship between error rate and number of speakers 358 |
9.2.4 Intra-speaker variation and evaluation of feature parameters 360 |
9.2.5 Likelihood (distance) normalization 364 |
9.3 Examples of Speaker Recognition Systems 366 |
9.3.1 Text-dependent speaker recognition systems 366 |
9.3.2 Text-independent speaker recognition systems 368 |
9.3.3 Text-prompted speaker recognition systems 373 |
10 FUTURE DIRECTIONS OF SPEECH INFORMATION PROCESSING 375 |
10.1 Overview 375 |
10.2 Analysis and Description of Dynamic Features 378 |
10.3 Extraction and Normalization of Voice Individuality 379 |
10.4 Adaptation to Environmental Variation 380 |
10.5 Basic Units for Speech Processing 381 |
10.6 Adavanced Knowledge Processing 382 |
10.7 Clarification of Speech Production Mechanism 383 |
10.8 Clarification of Speech Perception Mechanism 384 |
10.9 Evaluation Methods fo Speech Processing Technologies 385 |
10.10 LSI for Speech Processing Use 386 |
APPENDICES |
A Convolution and z-Transform 387 |
A.1 Convolution 387 |
A.2 z-Transform 388 |
A.3 Stability 391 |
B Vector Quantization Algorithm 393 |
B.1 VQ (Vector Quantization) Technique Formulation 393 |
B.2 Lloyd's Algorithm (K-Means Algorithm) 394 |
B.3 LBG Algorithm 395 |
C Neural Nets 399 |
Bibliography 405 |
Index 437 |
Series Introduction (K.J. Ray Liu) |
Preface to the Second Edition |
Acknowledgments |