Preface |
About Speech / 1: |
Introduction / 1.1: |
How Speech Is Produced / 1.2: |
The Vocal Tract / 1.2.1: |
Articulatory Phonetics / 1.2.2: |
Phonetic Alphabets / 1.2.3: |
Prosody and Suprasegmentals / 1.2.4: |
Syllables / 1.2.5: |
Dialects / 1.2.6: |
Languages (Other Than English) / 1.2.7: |
Acoustic Phonetics / 1.3: |
Phonemics / 1.4: |
Articulatory Processes / 1.5: |
References |
Representing Speech in the Computer / 2: |
Microphones / 2.1: |
Sampling / 2.3: |
Sampling Rate / 2.3.1: |
Quantization / 2.3.2: |
Speech Digitization / 2.4: |
Wave Form Coders / 2.4.1: |
Voice Coders (Vocoders) / 2.4.2: |
The Frequency Domain / 2.5: |
The Game of Jumble: Spectrum-Cepstrum, Frequency-Quefrency, Filtering-Liftering / 2.5.1: |
Spectrograms: A Hybrid Representation of Speech / 2.5.2: |
Speech Recognition / 3: |
Speech Recognition: What It Is; What It Isn't / 3.1: |
Why Is Speech Recognition Easy for Us and Difficult for Our Computers? / 3.3: |
A Brief History of Speech Recognition / 3.4: |
The Era of ARPA / 3.4.1: |
After ARPA / 3.4.2: |
Three Dimensions of Speech Recognition / 3.5: |
Continuous Versus Noncontinuous / 3.5.1: |
Speaker-Independent Versus Speaker-Dependent / 3.5.2: |
Vocabulary Size / 3.5.3: |
Tradeoffs and Interactions / 3.5.4: |
Units of Speech Recognition / 3.6: |
Words and Phrases / 3.6.1: |
Phonemes / 3.6.2: |
Diphones and Triphones / 3.6.4: |
Representing the Units / 3.7: |
Acoustic Features / 3.7.1: |
Comparing the Units / 3.8: |
Dynamic Time Warping (DTW) / 3.8.1: |
Hidden Markov Models (HMMs) / 3.8.2: |
Future Challenges I / 3.9: |
Artificial Neural Networks (ANNs) / 3.9.1: |
Errors / 3.10: |
Types of Errors / 3.10.1: |
Error Tolerances / 3.10.2: |
Performance Evaluation of Speech Recognizers / 3.11: |
Error Rates / 3.11.1: |
Other Factors / 3.11.2: |
Error Reduction / 3.12: |
Environmental Effects / 3.12.1: |
Human Factors / 3.12.2: |
Subsetting / 3.12.3: |
Vocabulary Selection / 3.12.4: |
Error Detection and Correction / 3.13: |
Feedback Systems / 3.13.1: |
Higher Levels of Linguistic Knowledge / 3.13.2: |
Automatic Error Correction / 3.13.3: |
Future Challenges II / 3.14: |
Speech Synthesis / 4: |
Introduction and History / 4.1: |
Parametric Coding (Electronic Synthesis) / 4.2: |
Parameters of Parametric Speech Synthesis / 4.2.1: |
Input Units of Parametric Speech Synthesis / 4.2.2: |
Concatenative Synthesis / 4.3: |
Allophone Concatenation / 4.3.1: |
Diphone Concatenation / 4.3.2: |
Demisyllable Concatenation / 4.3.3: |
Waveform of Concatenative Units / 4.3.4: |
Text-to-Speech Processing / 4.4: |
Rules and Exceptions / 4.4.1: |
Morphological Analysis / 4.4.2: |
Articulation Effects / 4.4.3: |
Prosody / 4.4.4: |
Special Problems / 4.4.5: |
Concept-to-Speech / 4.5: |
Languages of the World / 4.6: |
Performance Evaluation / 4.6.1: |
Intelligibility / 4.7.1: |
Comprehensibility / 4.7.2: |
Pleasantness/Naturalness / 4.7.3: |
Future Challenges / 4.8: |
Speaker Recognition, Language Identification, and Lip Synchronization / 5: |
Speaker Recognition / 5.1: |
Speaker Recognition Versus Speech Recognition / 5.1.1: |
Types of Speaker Recognition / 5.1.2: |
Text-Dependent, Text-Independent, and Text-Prompted Speaker Recognition / 5.1.3: |
"Voiceprints" / 5.1.4: |
Methods of Speaker Recognition / 5.1.5: |
Noise / 5.1.6: |
Performance Evaluation of Speaker Recognition Systems / 5.1.7: |
Co-channel Speaker Separation / 5.2: |
Language Identification / 5.3: |
Four Computational Approaches to Language Identification / 5.3.1: |
Performance Evaluation of Language Identification Systems / 5.3.2: |
Lip Synchronization / 5.4: |
Visemes / 5.4.1: |
Mapping Directly From the Speech Signal to Mouth Shapes / 5.4.2: |
Applications in Speech Recognition / 5.5: |
Criteria for a Viable Speech Recognition Application / 6.1: |
Hands Busy, Eyes Busy / 6.1.1: |
Remoteness / 6.1.2: |
Miniaturization / 6.1.3: |
2001 Won't Be 2001 / 6.2: |
The Role of Human Factors in Speech Recognition Applications / 6.3: |
Application Areas / 6.4: |
Assistive Technology / 6.4.1: |
Telecommunications / 6.4.2: |
Command and Control / 6.4.3: |
Data Entry and Retrieval / 6.4.4: |
Education / 6.4.5: |
Applications in Speech Synthesis / 7: |
"At the Tone, the Time Will Be..." / 7.1: |
When To Use Text-to-Speech; When To Use Digitally Recorded Speech / 7.2: |
Interactive Voice Response Systems (IVRs) / 7.3: |
Human Factors Revisited / 7.4: |
Aid for Persons With Disabilities / 7.5: |
Emergency Scenarios / 7.5.2: |
En Masse Advisories / 7.5.4: |
Information Retrieval / 7.5.5: |
Information Reporting / 7.5.6: |
Electronic Mail and Fax Readers / 7.5.7: |
In the Dark / 7.5.8: |
Toys and Games / 7.5.9: |
Transportation / 7.5.10: |
Government Services / 7.5.11: |
Disguise / 7.5.12: |
Applications in Speaker Recognition, Language Identification, and Lip Synchronization / 8: |
Applications in Speaker Recognition / 8.1: |
Access / 8.1.1: |
Authentication / 8.1.2: |
Monitoring / 8.1.3: |
Fraud Prevention / 8.1.4: |
Forensics / 8.1.5: |
Personal Services / 8.1.6: |
Applications in Language Identification / 8.2: |
Communications Monitoring / 8.2.1: |
Public Information Systems / 8.2.3: |
Applications in Automatic Lip Synching / 8.3: |
Animation / 8.3.1: |
Glossary |
About the Author |
Index |