Speech synthesis and recognition

>> Google Books

所蔵情報QRコード

Speech synthesis and recognition / John Holmes and Wendy Holmes

版:

2nd ed

資料種別:

図書

出版情報:

London : Taylor & Francis, 2001

形態:

xviii, 298 p. ; 24 cm

著者名:

ISBN:

9780748408566 [0748408568]
9780748408573 [0748408576] (: pbk)

書誌ID:

BA55063069

子書誌情報

フルテキスト

所蔵情報

他の版・巻

書誌詳細

注記:

Includes bibliographical references (p. [265]-276) and index
"First edition by the late Dr. J.N. Holmes published 1988 by Van Nostrand Reinhold"--T.p. verso

主題:

Automatic speech recognition; Speech synthesis

言語:

英語

目次情報:

Preface to the First Edition

Preface to the Second Edition

List of Abbreviations

Human Speech Communication / 1：

Value of speech for human-machine communication / 1.1：

Ideas and language / 1.2：

Relationship between written and spoken language / 1.3：

Phonetics and phonology / 1.4：

The acoustic signal / 1.5：

Phonemes, phones and allophones / 1.6：

Vowels, consonants and syllables / 1.7：

Phonemes and spelling / 1.8：

Prosodic features / 1.9：

Language, accent and dialect / 1.10：

Supplementing the acoustic signal / 1.11：

The complexity of speech processing / 1.12：

Chapter 1 summary

Chapter 1 exercises

Mechanisms and Models of Human Speech Production / 2：

Introduction / 2.1：

Sound sources / 2.2：

The resonant system / 2.3：

Interaction of laryngeal and vocal tract functions / 2.4：

Radiation / 2.5：

Waveforms and spectrograms / 2.6：

Speech production models / 2.7：

Excitation models / 2.7.1：

Vocal tract models / 2.7.2：

Chapter 2 summary

Chapter 2 exercises

Mechanisms and Models of the Human Auditory System / 3：

Physiology of the outer and middle ears / 3.1：

Structure of the cochlea / 3.3：

Neural response / 3.4：

Psychophysical measurements / 3.5：

Analysis of simple and complex signals / 3.6：

Models of the auditory system / 3.7：

Mechanical filtering / 3.7.1：

Models of neural transduction / 3.7.2：

Higher-level neural processing / 3.7.3：

Chapter 3 summary

Chapter 3 exercises

Digital Coding of Speech / 4：

Simple waveform coders / 4.1：

Pulse code modulation / 4.2.1：

Deltamodulation / 4.2.2：

Analysis/synthesis systems (vocoders) / 4.3：

Channel vocoders / 4.3.1：

Sinusoidal coders / 4.3.2：

LPC vocoders / 4.3.3：

Formant vocoders / 4.3.4：

Efficient parameter coding / 4.3.5：

Vocoders based on segmental/phonetic structure / 4.3.6：

Intermediate systems / 4.4：

Sub-band coding / 4.4.1：

Linear prediction with simple coding of the residual / 4.4.2：

Adaptive predictive coding / 4.4.3：

Multipulse LPC / 4.4.4：

Code-excited linear prediction / 4.4.5：

Evaluating speech coding algorithms / 4.5：

Subjective speech intelligibility measures / 4.5.1：

Subjective speech quality measures / 4.5.2：

Objective speech quality measures / 4.5.3：

Choosing a coder / 4.6：

Chapter 4 summary

Chapter 4 exercises

Message Synthesis from Stored Human Speech Components / 5：

Concatenation of whole words / 5.1：

Simple waveform concatenation / 5.2.1：

Concatenation of vocoded words / 5.2.2：

Limitations of concatenating word-size units / 5.2.3：

Concatenation of sub-word units: general principles / 5.3：

Choice of sub-word unit / 5.3.1：

Recording and selecting data for the units / 5.3.2：

Varying durations of concatenative units / 5.3.3：

Synthesis by concatenating vocoded sub-word units / 5.4：

Synthesis by concatenating waveform segments / 5.5：

Pitch modification / 5.5.1：

Timing modification / 5.5.2：

Performance of waveform concatenation / 5.5.3：

Variants of concatenative waveform synthesis / 5.6：

Hardware requirements / 5.7：

Chapter 5 summary

Chapter 5 exercises

Phonetic synthesis by rule / 6：

Acoustic-phonetic rules / 6.1：

Rules for formant synthesizers / 6.3：

Table-driven phonetic rules / 6.4：

Simple transition calculation / 6.4.1：

Overlapping transitions / 6.4.2：

Using the tables to generate utterances / 6.4.3：

Optimizing phonetic rules / 6.5：

Automatic adjustment of phonetic rules / 6.5.1：

Rules for different speaker types / 6.5.2：

Incorporating intensity rules / 6.5.3：

Current capabilities of phonetic synthesis by rule / 6.6：

Chapter 6 summary

Chapter 6 exercises

Speech Synthesis from Textual or Conceptual Input / 7：

Emulating the human speaking process / 7.1：

Converting from text to speech / 7.3：

TTS system architecture / 7.3.1：

Overview of tasks required for TTS conversion / 7.3.2：

Text analysis / 7.4：

Text pre-processing / 7.4.1：

Morphological analysis / 7.4.2：

Phonetic transcription / 7.4.3：

Syntactic analysis and prosodic phrasing / 7.4.4：

Assignment of lexical stress and pattern of word accents / 7.4.5：

Prosody generation / 7.5：

Timing pattern / 7.5.1：

Fundamental frequency contour / 7.5.2：

Implementation issues / 7.6：

Current TTS synthesis capabilities / 7.7：

Speech synthesis from concept / 7.8：

Chapter 7 summary

Chapter 7 exercises

Introduction to automatic speech recognition: template matching / 8：

General principles of pattern matching / 8.1：

Distance metrics / 8.3：

Filter-bank analysis / 8.3.1：

Level normalization / 8.3.2：

End-point detection for isolated words / 8.4：

Allowing for timescale variations / 8.5：

Dynamic programming for time alignment / 8.6：

Refinements to isolated-word DP matching / 8.7：

Score pruning / 8.8：

Allowing for end-point errors / 8.9：

Dynamic programming for connected words / 8.10：

Continuous speech recognition / 8.11：

Syntactic constraints / 8.12：

Training a whole-word recognizer / 8.13：

Chapter 8 summary

Chapter 8 exercises

Introduction to stochastic modelling / 9：

Feature variability in pattern matching / 9.1：

Introduction to hidden Markov models / 9.2：

Probability calculations in hidden Markov models / 9.3：

The Viterbi algorithm / 9.4：

Parameter estimation for hidden Markov models / 9.5：

Forward and backward probabilities / 9.5.1：

Parameter re-estimation with forward and backward probabilities / 9.5.2：

Viterbi training / 9.5.3：

Vector quantization / 9.6：

Multi-variate continuous distributions / 9.7：

Use of normal distributions with HMMs / 9.8：

Probability calculations / 9.8.1：

Estimating the parameters of a normal distribution / 9.8.2：

Baum-Welch re-estimation / 9.8.3：

Model initialization / 9.8.4：

Gaussian mixtures / 9.10：

Calculating emission probabilities / 9.10.1：

Re-estimation using the most likely state sequence / 9.10.2：

Initialization of Gaussian mixture distributions / 9.10.4：

Tied mixture distributions / 9.10.5：

Extension of stochastic models to word sequences / 9.11：

Implementing probability calculations / 9.12：

Using the Viterbi algorithm with probabilities in logarithmic form / 9.12.1：

Adding probabilities when they are in logarithmic form / 9.12.2：

Relationship between DTW and a simple HMM / 9.13：

State durational characteristics of HMMs / 9.14：

Chapter 9 summary

Chapter 9 exercises

Introduction to front-end analysis for automatic speech recognition / 10：

Pre-emphasis / 10.1：

Frames and windowing / 10.3：

Filter banks, Fourier analysis and the mel scale / 10.4：

Cepstral analysis / 10.5：

Analysis based on linear prediction / 10.6：

Dynamic features / 10.7：

Capturing the perceptually relevant information / 10.8：

General feature transformations / 10.9：

Variable-frame-rate analysis / 10.10：

Chapter 10 summary

Chapter 10 exercises

Practical techniques for improving speech recognition performance / 11：

Robustness to environment and channel effects / 11.1：

Feature-based techniques / 11.2.1：

Model-based techniques / 11.2.2：

Dealing with unknown or unpredictable noise corruption / 11.2.3：

Speaker-independent recognition / 11.3：

Speaker normalization / 11.3.1：

Model adaptation / 11.4：

Bayesian methods for training and adaptation of HMMs / 11.4.1：

Adaptation methods based on linear transforms / 11.4.2：

Discriminative training methods / 11.5：

Maximum mutual information training / 11.5.1：

Training criteria based on reducing recognition errors / 11.5.2：

Robustness of recognizers to vocabulary variation / 11.6：

Chapter 11 summary

Chapter 11 exercises

Automatic speech recognition for large vocabularies / 12：

Historical perspective / 12.1：

Speech transcription and speech understanding / 12.3：

Speech transcription / 12.4：

Challenges posed by large vocabularies / 12.5：

Acoustic modelling / 12.6：

Context-dependent phone modelling / 12.6.1：

Training issues for context-dependent models / 12.6.2：

Parameter tying / 12.6.3：

Training procedure / 12.6.4：

Methods for clustering model parameters / 12.6.5：

Constructing phonetic decision trees / 12.6.6：

Extensions beyond triphone modelling / 12.6.7：

Language modelling / 12.7：

N-grams / 12.7.1：

Perplexity and evaluating language models / 12.7.2：

Data sparsity in language modelling / 12.7.3：

Discounting / 12.7.4：

Backing off in language modelling / 12.7.5：

Interpolation of language models / 12.7.6：

Choice of more general distribution for smoothing / 12.7.7：

Improving on simple N-grams / 12.7.8：

Decoding / 12.8：

Efficient one-pass Viterbi decoding for large vocabularies / 12.8.1：

Multiple-pass Viterbi decoding / 12.8.2：

Depth-first decoding / 12.8.3：

Evaluating LVCSR performance / 12.9：

Measuring errors / 12.9.1：

Controlling word insertion errors / 12.9.2：

Performance evaluations / 12.9.3：

Speech understanding / 12.10：

Measuring and evaluating speech understanding performance / 12.10.1：

Chapter 12 summary

Chapter 12 exercises

Neural networks for speech recognition / 13：

The human brain / 13.1：

Connectionist models / 13.3：

Properties of ANNs / 13.4：

ANNs for speech recognition / 13.5：

Hybrid HMM/ANN methods / 13.5.1：

Chapter 13 summary

Chapter 13 exercises

Recognition of speaker characteristics / 14：

Characteristics of speakers / 14.1：

Verification versus identification / 14.2：

Assessing performance / 14.2.1：

Measures of verification performance / 14.2.2：

Speaker recognition / 14.3：

Text dependence / 14.3.1：

Methods for text-dependent/text-prompted speaker recognition / 14.3.2：

Methods for text-independent speaker recognition / 14.3.3：

Acoustic features for speaker recognition / 14.3.4：

Evaluations of speaker recognition performance / 14.3.5：

Language recognition / 14.4：

Techniques for language recognition / 14.4.1：

Acoustic features for language recognition / 14.4.2：

Chapter 14 summary

Chapter 14 exercises

Applications and performance of current technology / 15：

Why use speech technology? / 15.1：

Speech synthesis technology / 15.3：

Examples of speech synthesis applications / 15.4：

Aids for the disabled / 15.4.1：

Spoken warning signals, instructions and user feedback / 15.4.2：

Education, toys and games / 15.4.3：

Telecommunications / 15.4.4：

Speech recognition technology / 15.5：

Characterizing speech recognizers and recognition tasks / 15.5.1：

Typical recognition performance for different tasks / 15.5.2：

Achieving success with ASR in an application / 15.5.3：

Examples of ASR applications / 15.6：

Command and control / 15.6.1：

Dictation / 15.6.2：

Data entry and retrieval / 15.6.4：

Applications of speaker and language recognition / 15.6.5：

The future of speech technology applications / 15.8：

Chapter 15 summary

Chapter 15 exercises

Future research directions in speech synthesis and recognition / 16：

Speech synthesis / 16.1：

Speech sound generation / 16.2.1：

Prosody generation and higher-level linguistic processing / 16.2.2：

Automatic speech recognition / 16.3：

Advantages of statistical pattern-matching methods / 16.3.1：

Limitations of HMMs for speech recognition / 16.3.2：

Developing improved recognition models / 16.3.3：

Relationship between synthesis and recognition / 16.4：

Automatic speech understanding / 16.5：

Chapter 16 summary

Chapter 16 exercises

類似資料:

1 図書 Computer speech : recognition, compression, synthesis : with introductions to hearing and signal analysis and a glossary of speech and computer terms Schroeder, M. R. (Manfred Robert), 1926- Springer-Verlag	7 図書 Progress in speech synthesis Santen, Jan P. H. van Springer
2 図書 Fundamentals of speech synthesis and speech recognition : basic concepts, state of the art and future challenges Keller, Eric Wiley	8 図書 Data-driven techniques in speech synthesis Damper, R. I. (Robert I.) Kluwer Academic Publishers
3 図書 Speech synthesis and recognition systems (: U.K ; : U.S) Yannakoudakis, E. J., 1950-, Hutton, P. J., 1956- E. Horwood, Halsted Press	9 図書 Multilingual text-to-speech synthesis : the Bell Labs approach Sproat, Richard William, Pols, Louis, Lucent Technologies (Firm) Kluwer
4 図書 Computer speech processing Fallside, Frank, 1932-, Woods, William A. Prentice-Hall International	10 図書 Phonological parsing in speech recognition Church, Kenneth Ward Kluwer Academic Publishers
5 図書 Speech processing and synthesis toolboxes Childers, Donald G. Wiley	11 図書 Trends in speech recognition Lea, Wayne A. Prentice-Hall
6 図書 An introduction to text-to-speech synthesis (pbk.(Digital Print 2001)) Dutoit, Thierry Kluwer Academic	12 図書 Prosody and speech recognition (: Pitman ; : Morgan Kaufmann) Waibel, Alexander Pitman, Morgan Kaufmann Publishers