网站首页 软件下载 游戏下载 翻译软件 电子书下载 电影下载 电视剧下载 教程攻略
书名 | 实时语音处理实践指南 |
分类 | 教育考试-考试-计算机类 |
作者 | 葛世超等 |
出版社 | 电子工业出版社 |
下载 | ![]() |
简介 | 作者简介 "葛世超,硕士,毕业于西安电子科技大学雷达国防重点实验室,先后任职于阿里巴巴和rokid,从事语音算法工作。吕强,学士,吉林大学通信工程专业毕业,原微鲸电视系统软件音频专家。钱思冲 武汉理工大学博士,2016年至2018年在rokid从事麦克风阵列信号研究,目前主要研究语音信号盲源分离。张博伦,硕士研究生,毕业于中国海洋大学海底科学与探测技术教育部重点实验室。毕业后先后从事水声、音频信号处理等工作。张硕,毕业于西安电子科技大学和法国高等电力学院,先后任职于诺基亚和Rokid,从事语音算法相关工作。" 目录 绪论······································································.1章 信号处理··············································.71.1 数字和模拟频率··········································.71.2 离散傅里叶变换···········································81.2.1 实数DFT ·····································.91.2.2 复数DFT ···································.101.2.3 负频分量····································.101.2.4 DFT变换性质···························.101.3 FFT···························································.111.3.1 FFT 结果举例····························.121.3.2 实信号FFT································.131.3.3 短时傅里叶变换························.141.3.4 STFT语音窗函数选择··············.141.4 重叠相加法和重叠保留法·························.161.4.1 OLA············································.171.4.2 OLS ············································.191.5 加权重叠相加法········································.211.5.1 WOLA 计算过程·······················.221.5.2 WOLA 窗函数选择···················.221.6 滤波器组···················································.231.7 语音预加重····································.271.8 高斯分布···················································.271.8.1 单高斯分布································.271.8.2 多维高斯分布····························.291.9 HMM模型················································.311.10 卡尔曼滤波·············································.32本章小结·····························································.33参考文献·····························································.33第2章 发音机理和器件·······························.342.1 语音的产生和接收········································.342.1.1 语音产生机理····························.342.1.2 发声模型····································.362.1.3 发音单位····································.362.1.4 发音分类····································.372.1.5 声音接收····································.372.1.6 声音传播····································.382.2 扬声器·······················································.382.2.1 电学性能····································.382.2.2 声学性能····································.392.2.3 底噪············································.402.2.4 频响特性····································.412.2.5 THD+N POUT···························.412.2.6 电压(功率)和失真················.422.3 麦克风·······················································.422.3.1 麦克风性能指标························.422.3.2 麦克风的选择····························.432.4 结构设计····················································452.4.1 扬声器相关音腔设计················.452.4.2 麦克风和扬声器························.452.5 音频设备···················································.462.5.1 听音设备····································.462.5.2 声场表现力································.472.5.3 发声设备····································.482.5.4 消声室测试································.482.6 声学测试···················································.492.6.1 声学音量····································.502.6.2 失真度THD·······························.502.6.3 频响混叠····································.512.6.4 麦克风阵列一致性····················.532.6.5 AEC参考通路···························.542.6.6 扬声器镜频································.562.6.7 扬声器优选幅度下的THD·······.57本章小结·····························································.58参考文献·····························································.58第3章 语音端点检测····································.593.1 特征选取···················································.593.2 判决准则···················································.613.2.1 门限············································.613.2.2 统计模型法································.613.2.3 机器学习法································.623.3 VAD 实例·················································.633.3.1 高斯分布····································.633.3.2 算法流程····································.633.3.3 计算流程····································.683.4 语音/非语音帧的初始参数························.753.4.1 模型参数计算····························.753.4.2 高斯混合模型····························.763.4.3 EM算法·····································.76本章小结·····························································.78参考文献·····························································.78第4章 单通道降噪········································.794.1 谱减法·······················································.794.1.1 谱减法原理································.794.1.2 谱减法实现································.814.1.3 音乐噪声控制····························.834.1.4 滤波法········································.834.2 维纳滤波···················································.844.3 子空间降噪···············································.864.4 WebRTC 单通道降噪实现······················.874.4.1 算法原理····································.874.4.2 算法初始化································.884.4.3 信噪比计算:ComputeSnr ·······.904.4.4 语音噪声概率计算····················.914.4.5 特征选取····································.944.4.6 平坦度计算································.964.4.7 噪声估计更新函数:UpdateNoiseEstimate···············.974.4.8 消除噪声····································.984.4.9 信号合成····································.994.4.10 仿真结果··································.994.5 深度学习降噪········································.101本章小结···························································.104参考文献···························································.105第5章 声学回声消除·································.1065.1 回声消除原理·········································.1065.2 自适应滤波器·········································.1085.2.1 维纳滤波器······························.1085.2.2 LMS算法································.1095.2.3 NLMS算法······························.1105.2.4 PBFDAF 算法··························.1115.3 WebRTC 回声消除算法·······················.1135.3.1 延迟估计··································.1135.3.2 自适应滤波······························.1145.3.3 非线性处理(NLP)··············.1175.3.4 MATLAB代码解读················.1185.3.5 仿真实验··································.1275.4 Speex 回声消除算法·····························.1285.4.1 变步长计算······························.1295.4.2 双线性滤波器及预处理··········.1305.4.3 MATLAB代码解读················.1325.4.4 算法流程示意图······················.1415.4.5 仿真实验··································.144本章小结···························································.146参考文献···························································.146第6章 声源定位··········································.1476.1 GCC算法·····················.1476.2 SRP-PHAT算法··································.1496.3 MUSIC算法···········································.1506.4 TOPS 算法·············································.1526.5 FRIDA算法············································.1546.6 后处理抗噪·············································.1556.6.1 统计方法··································.1556.6.2 卡尔曼方法······························.1566.6.3 声源定位建模··························.1586.6.4 粒子滤波法······························.160本章小结···························································.160参考文献···························································.161第7章 波束形成技术··································.1627.1 麦克风阵列·············································.1637.1.1 麦克风数量和间距··················.1637.1.2 空域混叠··································.1657.1.3 波束形成指标··························.1657.1.4 噪声场······································.1667.1.5 声辐射······································.1677.2 常见波束形成方法··································.1687.2.1 延迟和波束形成方法··············.1687.2.2 滤波和波束形成方法··············.1697.2.3 恒定宽度波束形成方法··········.1697.2.4 超分辨波束形成方法··············.1707.2.5 广义旁瓣相消波束形成方法··.1717.2.6 最小方差信号无畸变响应波束形成方法················.1727.3 WebRTC 波束形成实例·······················.1747.3.1 编译测试文件··························.1747.3.2 测试文件处理流程··················.1757.3.3 测试命令··································.1767.3.4 算法的基本思想······················.1767.3.5 测试源码··································.1787.3.6 算法处理流程··························.1817.3.7 权重计算函数··························.1857.3.8 权重相乘操作··························.1867.4 后置滤波(Post-filtering) ·················.1877.4.1 MMSE后置滤波·····················.1897.4.2 Zelinski 后置滤波····················.1907.4.3 mccowan后置滤波·················.1917.4.4 STSA后置滤波·······················.192本章小结···························································.193参考文献···························································.194第8章 盲源分离··········································.1968.1 基本概念及数学预备知识······················.1968.1.1 ICA基本概念··························.1968.1.2 梯度和最优化方法··················.1978.2 盲语音分离预处理――PCA··················.1998.3 频域独立成分分析法――FDICA··········.2008.3.1 频域ICA··································.2008.3.2 去相关估计方法······················.2008.3.3 不确定性问题··························.2018.4 后置滤波处理··········································.2058.4.1 噪声估计··································.2058.4.2 衰减因子计算··························.2068.5 GSC 与ICA联合估计···························.2098.5.1 峭度··········································.2098.5.2 经典GSC·································.2108.5.3 动态权重向量估计··················.210本章小结···························································.212参考文献···························································.213第9章 音效处理··········································.2149.1 声道的分类·············································.2149.1.1 单声道······································.2149.1.2 双声道······································.2159.1.3 立体声······································.2159.1.4 多声道······································.2159.1.5 全景声······································.2169.2 后端音效处理··········································.217本章小结···························································.226参考文献···························································.2260章 语音编/解码··································.22710.1 LPC 编码·············································.23010.2 SILK编/解码········································.23110.2.1 编码参数································.23210.2.2 编码器····································.23410.2.3 解码器····································.23910.3 opus 编/解码概览································.23910.3.1 opus 解码·······························.24210.3.2 opus 编码·······························.24310.3.3 opus 语音/音乐检测·············.24410.4 语音质量评估·······································.24710.4.1 主观测试································.24810.4.2 客观测试································.24810.4.3 无参考质量评估····················.249本章小结···························································.249参考文献···························································.2491章 语音网络传输·······························.25111.1 拥塞控制···············································.25211.1.1 GoogleCC拥塞控制··············.25511.1.2 基于PCC的拥塞控制··········.26011.1.3 基于BBR 的拥塞控制··········.26411.2 NetEQ ·················································.26611.2.1 NetEQ原理····························.26611.2.2 抖动和收包····························.26811.2.3 NetEQ代码框架····················.26911.2.4 延迟计算································.27211.2.5 DSP 处理·······························.27411.2.6 变速不变调····························.275本章小结···························································.277参考文献···························································.2772章 语音唤醒·······································.27812.1 语音唤醒技术简介································.27812.2 特征提取···············································.27912.2.1 FBank ·····································.27912.2.2 MFCC·····································.28312.2.3 PCEN ·····································.28412.3 模型结构···············································.28412.3.1 DNN ·······································.28412.3.2 CNN ·······································.28612.3.3 CRNN·····································.28712.3.4 DSCNN ··································.28812.3.5 子带CNN ······························.28912.3.6 Attention·································.29012.4 计算加速···············································.29212.4.1 硬件资源评估························.29212.4.2 加速方向································.294本章小结···························································.299参考文献···························································.2993章 语音识别·······································.30113.1 语音特征提取·······································.30313.1.1 MFCC特征····························.30413.1.2 PLP 特征································.30513.1.3 归一化····································.30613.2 声学模型···············································.30613.2.1 高斯混合模型························.30713.2.2 参数估计································.30713.2.3 隐马尔科夫模型····················.30813.2.4 Baum-Welch法······················.30913.2.5 HMM识别器·························.30913.3 语言模型···············································.31013.3.1 N-gram语言模型··················.31113.3.2 加权有限状态转换机············.31213.4 YES 和NO识别实例···························31213.4.1 数据准备································.31213.4.2 数据预处理····························.31313.4.3 词汇和发音词典····················.31413.4.4 语言学模型····························.31513.4.5 特征提取································.31913.4.6 声学模型训练························.32013.4.7 解码和测试····························.32113.5 Kaldi 中文语音识别······························32113.5.1 数据集准备····························.32113.5.2 声学模型训练························.32213.5.3 安装portaudio ·······················.32213.5.4 在线识别································.32313.6 DeepSpeech 语音识别······················.32413.6.1 识别建模································.32513.6.2 网络组成································.32513.6.3 模型训练和部署····················.326本章小结···························································.330参考文献···························································.330附录A 本书涉及的专业术语··························.331 内容推荐 本书主要介绍基于互联网场景的交互式实时语音处理流程,内容涉及智能语音助手、智能音箱、音/视频会议等,具体包括实时语音信号处理、数字音效、网络传输编/解码和语音唤醒识别四部分。在阐述各部分的内容时,本书从基本概念和原理入手,将理论和实践相结合,并细致分析了极具商业价值的实例,以帮助读者了解相关算法在工程上是如何实现的。另外,为便于有兴趣的读者快速进行算法验证并将其改进和应用到实际的项目中,作者也开源了书中算法的源码。 |
随便看 |
|
霍普软件下载网电子书栏目提供海量电子书在线免费阅读及下载。