Preface
CHAPTER 1 Introduction to Digital SpeechProcessing 1
1.1 The SpeechSignal 3
1.2 The SpeechStack 8
1.3 Applicationsof Digital SpeechProcessing 10
1.4 Commentonthe References 15
1.5 Summary 17
CHAPTER 2 Reviewof Fundamentalsof DigitalSignalProcessing 18
2.1 Introduction 18
2.2 Discrete-Time Signals and Systems 18
2.3 Transform Representation of Signals and Systems 22
2.4 Fundamentalsof DigitalFilters 33
2.5 Sampling 44
2.6 Summary 56
Problems 56
CHAPTER 3 Fundamentalsof Human SpeechProduction 67
3.1 Introduction 67
3.2 The ProcessofSpeechProduction 68
3.3 Short-TimeFourierRepresentationofSpeech 81
3.4 AcousticPhonetics 86
3.5 DistinctiveFeaturesof thePhonemesof American English 108
3.6 Summary 110
Problems 110
CHAPTER 4 Hearing,Auditory Models,and SpeechPerception 124
4.1 Introduction 124
4.2 The SpeechChain 125
4.3 Anatomy andFunctionof theEar 127
4.4 The Perception of Sound 133
4.5 Auditory Models 150
4.6 Human SpeechPerceptionExperiments 158
4.7 MeasurementofSpeechQualityand Intelligibility 162
4.8 Summary 166
Problems 167
CHAPTER 5 Sound Propagationinthe HumanVocalTract 170
5.1 The AcousticTheoryofSpeechProduction 170
5.2 LosslessTube Models 200
5.3 Digital Models forSampled SpeechSignals 219
5.4 Summary 228
Problems 228
CHAPTER 6 Time-DomainMethods for SpeechProcessing 239
6.1 Introduction 239
6.2 Short-TimeAnalysisofSpeech 242
6.3 Short-TimeEnergyand Short-TimeMagnitude 248
6.4 Short-TimeZero-Crossing Rate 257
6.5 The Short-TimeAutocorrelation Function 265
6.6 The Modied Short-TimeAutocorrelation Function 273
6.7 The Short-TimeAverage Magnitude DifferenceFunction 275
6.8 Summary 277
Problems 278
CHAPTER 7 Frequency-DomainRepresentations 287
7.1 Introduction 287
7.2 Discrete-TimeFourierAnalysis 289
7.3 Short-TimeFourierAnalysis 292
7.4 SpectrographicDisplays 312
7.5 OverlapAddition Methodof Synthesis 319
7.6 Filter Bank SummationMethodof Synthesis 331
7.7 Time-DecimatedFilter Banks 340
7.8 Two-ChannelFilter Banks 348
7.9 Implementationof theFBS Method Usingthe FFT 358
7.10 OLARevisited 365
7.11 Modicationsof theSTFT 367
7.12 Summary 379
Problems 380
CHAPTER 8 TheCepstrumand Homomorphic SpeechProcessing 399
8.1 Introduction 399
8.2 HomomorphicSystems forConvolution 401
8.3 HomomorphicAnalysisofthe SpeechModel 417
8.4 Computingthe Short-TimeCepstrumand ComplexCepstrum of Speech 429
8.5 HomomorphicFilteringofNatural Speech 440
8.6 CepstrumAnalysisofAll-Pole Models 456
8.7 CepstrumDistanceMeasures 459
8.8 Summary 466
Problems 466
CHAPTER 9 Linear Predictive Analysisof SpeechSignals 473
9.1 Introduction 473
9.2 Basic Principles of Linear Predictive Analysis 474
9.3 Computationofthe Gainfor theModel 486
9.4 FrequencyDomainInterpretationsof Linear PredictiveAnalysis 490
9.5 Solutionofthe LPCEquations 505
9.6 The Prediction ErrorSignal 527
9.7 SomePropertiesofthe LPCPolynomial A(z) 538
9.8 RelationofLinear Predictive Analysisto LosslessTube Models 546
9.9 Alternative Representationsof theLPParameters 551
9.10 Summary 560Problems 560
CHAPTER 10 Algorithms for Estimating SpeechParameters 578
10.1 Introduction 578
10.2 MedianSmoothing and SpeechProcessing 580
10.3 Speech-Background/SilenceDiscrimination 586
10.4 ABayesianApproach toVoiced/Unvoiced/Silence Detection 595
10.5 Pitch Period Estimation(Pitch Detection) 603
10.6 Formant Estimation 635
10.7 Summary 645Problems 645
CHAPTER 11 DigitalCodingof SpeechSignals 663
11.1 Introduction 663
11.2 Sampling SpeechSignals 667
11.3 AStatisticalModelfor Speech 669
11.4 Instantaneous Quantization 676
11.5 AdaptiveQuantization 706
11.6 QuantizingofSpeechModelParameters 718
11.7 GeneralTheoryof DifferentialQuantization 732
11.8 Delta Modulation 743
11.9 DifferentialPCM (DPCM) 759
11.10 Enhancements forADPCM Coders 768
11.11 Analysis-by-Synthesis SpeechCoders 783
11.12 Open-Loop SpeechCoders 806
11.13 Applicationsof SpeechCoders 814
11.14 Summary 819Problems 820
CHAPTER 12 Frequency-DomainCodingof SpeechandAudio 842
12.1 Introduction 842
12.2 HistoricalPerspective 844
12.3 Subband Coding 850
12.4 AdaptiveTransform Coding 861
12.5 APerception ModelforAudioCoding 866
12.6 MPEG-1AudioCoding Standard 881
12.7 OtherAudioCoding Standards 894
12.8 Summary 894Problems 895
CHAPTER 13 Text-to-SpeechSynthesis Methods 907
13.1 Introduction 907
13.2 Text Analysis 908
13.3 Evolutionof SpeechSynthesis Methods 914
13.4 Early SpeechSynthesis Approaches 916
13.5 UnitSelection Methods 926
13.6 TTS Future Needs 942
13.7 Visual TTS 943
13.8 Summary 947Problems 947
CHAPTER 14 Automatic SpeechRecognition andNatural Language Understanding 950
14.1 Introduction 950
14.2 Basic ASRFormulation 952
14.3 Overall SpeechRecognition Process 953
14.4 Buildinga SpeechRecognition System 954
14.5 The DecisionProcessesinASR 957
14.6 Step3:The Search Problem 971
14.7 SimpleASR System: IsolatedDigit Recognition 972
14.8 Performance Evaluationof SpeechRecognizers 974
14.9 SpokenLanguage Understanding 977
14.10 Dialog Managementand SpokenLanguage Generation 980
14.11 User Interfaces 983
14.12 MultimodalUserInterfaces 984
14.13 Summary 984Problems 985
Appendices
A SpeechandAudioProcessing Demonstrations 993
B SolutionofFrequency-DomainDifferentialEquations 1005
Bibliography 1008
Index 1031