![]()
内容推荐 本书深度探讨了当今科技领域最引人注目的大语言模型相关技术,内容主要围绕大规模语言模型构建、评估和应用展开为下面的四个主要部分:第一部分主要介绍 大规模语言模型的发展历程以及预训练相关内容, 包括语言模型基本架构、大规模语言模型的高效微调技术、人类反馈的强化学 习和分布式模型训练; 第二个部分主要介绍大规模语言模型的推理优化技术、推理加速框架和模型的评估; 第 三个部分主要介绍大规模语言模型扩展和应用, 包括大规模语言模型和知识的融合、多模态大规模语言模型以 及大规模语言模型的垂直领域应用;第四个部分主要介绍大规模语言模型研究的困难、挑战和未来潜在研究方向。
本书的一些亮点特色包括:深度解析技术原理: 本书通过通俗易懂的语言,对大语言模型的相关技术进行深入解析,使读者能够理解模型的工作机制,训练优调和指标评估方法,从而更好地应用于实际项目中;扩展应用和案例: 本书介绍了大语言模型和知识融合,以及多模态大语言模型的两种常见扩展应用,而且通过了丰富的实际案例,书中展示了大语言模型在各行各业中的成功应用,读者可以了解到它如何改变传统业务流程,提高工作效率;关注社会影响与伦理问题: 除了技术层面,书中还关注大语言模型对社会的深远影响,涵盖创作权、隐私等方面的伦理问题,引导读者深思技术发展对社会的影响。未来趋势与开放性问题: 书籍不仅着眼于已有成果,还展望了大语言模型领域的未来趋势,提出了一系列开放性问题,鼓励读者参与到这一领域的探索中。
本书读者对象包括:技术爱好者:为人工智能和自然语言处理感兴趣的技术人员提供相对全面的大语言模型的介绍资料;相关从业者:提供给相关从业人员了 目录 目录 第 1章大规模语言模型的背景介绍 ......................................1
1.1语言建模的发展阶段 ................................................ 2
1.2大规模语言模型带来的机遇 ...................................... 3
第 2章从统计语言模型到预训练语言模型 ............................5
2.1统计语言模型 .......................................................... 6
2.2神经网络语言模型.................................................... 7
2.2.1前馈神经网络语言模型 .................................. 7
2.2.2循环神经网络语言模型 .................................. 8
2.2.3长短期记忆神经网络语言模型 ........................ 9
2.2.4 Word2Vec词向量表示模型 ...........................10
2.3\t预训练语言模型 ......................................................12 ELMo .........................................................12
2.3.1 Transformer.................................................13 2.3.2 BERT .........................................................22 2.3.3 ELECTRA ..................................................23 2.3.4 GPT 1-3......................................................25 2.3.5 BART .........................................................29 2.3.6 T5 ..............................................................31 2.3.7
第 3章大规模语言模型的框架结构 ........................................................................34
3.1编码器结构.................................................................................................36
3.2\t编码器-解码器结构 .....................................................................................36 GLM...............................................................................................36
3.2.1 UL2 ................................................................................................41 3.2.2 3.3\t解码器结构.................................................................................................43 PaLM..............................................................................................43
3.3.1 BLOOM..........................................................................................45 3.3.2 InstructGPT....................................................................................47 3.3.3 3.4 LLaMA家族 ..............................................................................................50
3.4.1预训练数据 ......................................................................................52
3.4.2模型架构 .........................................................................................53
3.4.3中文 LLaMA ...................................................................................62
3.4.4中文 Alpaca.....................................................................................66
第 4章大规模语言模型的训练方法 ........................................................................69
4.1模型的训练成本 ..........................................................................................71
4.1.1算力估算 .........................................................................................71
4.1.2费用和能耗 ......................................................................................72
4.2有监督微调.................................................................................................74
4.2.1提示学习 .........................................................................................75
4.2.2上下文学习 ......................................................................................76
4.2.3指令微调 .........................................................................................77
4.3参数高效微调 .............................................................................................78
4.3.1部分参数的高效微调.........................................................................79
4.3.2参数增加的高效微调.........................................................................80
4.3.3重参数化的高效微调.........................................................................91
4.3.4混合高效微调系列 ............................................................................97
4.4人类反馈强化学习.....................................................................................100
4.4.1强化学习 .......................................................................................101
4.4.2近端策略优化.................................................................................104
4.4.3人类反馈对齐.................................................................................111
4.5大模型灾难性遗忘.....................................................................................123
第 5章大模型分布式并行技术.............................................................................125
5.1分布式系统...............................................................................................125
5.2数据并行 ..................................................................................................129
5.2.1输入数据切分.................................................................................130
5.2.2模型参数同步.................................................................................131
5.2.3数据并行优化.................................................................................132
5.3模型并行 ..................................................................................................134
5.3.1\t张量并行 .......................................................................................134
5.3.2\t流水线并行 ....................................................................................139
5.3.3\t优化器相关并行 .............................................................................141
5.4其他并行 ..................................................................................................146
5.4.1\t异构系统并行.................................................................................146
5.4.2\t专家并行 .......................................................................................147
5.4.3\t多维混合并行.................................................................................148
5.4.4\t自动并行 .......................................................................................149
5.5并行训练框架 ...........................................................................................149
5.5.1 \tMegatron-LM................................................................................152
5.5.2 \tDeepSpeed.....................................................................................159 Colossal-AI....................................................................................163
5.5.3 第 6章大规模语言模型解码推理优化相关技术 .....................................................168
6.1解码方法 ..................................................................................................168
6.1.1\t基于搜索的解码方法.......................................................................169
6.1.2\t基于采样的解码方法.......................................................................171
6.2推理优化方法 ...........................................................................................174
6.2.1\t推理原理 .......................................................................................177
6.2.2\t推理加速 .......................................................................................177
6.3模型压缩技术 ...........................................................................................179
6.3.1\t量化 ..............................................................................................181
6.3.2\t剪枝 ..............................................................................................184
6.3.3\t蒸馏 ..............................................................................................186
6.4显存优化技术 ...........................................................................................187
6.4.1\t键值缓存 .......................................................................................187
6.4.2\t注意力优化 ....................................................................................188
6.5算子优化技术 ...........................................................................................195
6.5.1\t算子融合 .......................................................................................195
6.5.2\t高性能算子 ....................................................................................195
6.6推理加速框架 ...........................................................................................195
6.6.1 \tHuggingFace TGI...........................................................................196 vLLM............................................................................................197
6.6.2 6.6.3 \tLightLLM......................................................................................200
第 7章大规模语言模型的评估.............................................................................203
7.1评估概述 ..................................................................................................205
7.2评估体系 ..................................................................................................206
7.2.1知识与能力 ....................................................................................207
7.2.2伦理与安全 ....................................................................................209
7.3评估方法 ..................................................................................................212
7.3.1自动评估 .......................................................................................213
7.3.2人工评估 .......................................................................................217
7.3.3其他评估 .......................................................................................221
7.4评估领域 ..................................................................................................223
7.4.1通用领域 .......................................................................................223
7.4.2特定领域 .......................................................................................226
7.4.3综合评测 .......................................................................................227
7.5评估挑战 ..................................................................................................232
第 8章大规模语言模型与知识的结合...................................................................233
8.1知识和知识表示 ........................................................................................233
8.2知识图谱简介 ...........................................................................................236
8.3大规模语言模型和知识图谱的结合 .............................................................238
8.4知识图谱增强大规模语言模型 ....................................................................240
8.4.1 LLM预训练阶段............................................................................240
8.4.2 LLM评估阶段 ...............................................................................245
8.4.3 LLM推理阶段 ...............................................................................247
8.5大规模语言模型增强知识图谱 ....................................................................249
8.5.1知识图谱嵌入.................................................................................249
8.5.2知识图谱补全.................................................................................251
8.5.3知识图谱构建.................................................................................257
8.5.4知识图谱到文本生成.......................................................................263
8.5.5知识图谱问答.................................................................................265
8.6大规模语言模型和知识图谱协同.................................................................267
8.6.1知识表示 .......................................................................................267
8.6.2知识推理 .......................................................................................268
8.7知识检索增强大规模语言模型工程应用.......................................................268
8.7.1结构化数据 ....................................................................................269
8.7.2结构化和非结构化数据 ...................................................................270
8.7.3向量数据库 ....................................................................................272
8.7.4 LangChain知识库问答...................................................................276
8.8未来的发展方向 ........................................................................................279
第 9章多模态大规模语言模型技术应用 ...............................................................281
9.1多模态指令调节 ........................................................................................285
9.1.1模态对齐 .......................................................................................286
9.1.2数据收集 .......................................................................................287
9.1.3模态桥接 .......................................................................................290
9.1.4模型评估 .......................................................................................292
9.2多模态上下文学习.....................................................................................296
9.3多模态思维链 ...........................................................................................299
9.3.1模态连接 .......................................................................................299
9.3.2学习范式 .......................................................................................300
9.3.3链的配置和形式 .............................................................................301
9.4 LLM辅助视觉推理 ...................................................................................301
9.4.1训练范式 .......................................................................................303
9.4.2功能角色 .......................................................................................305
9.4.3模型评估 .......................................................................................307
9.5 LLM扩展智能体 ......................................................................................307
9.5.1智能体...........................................................................................308
9.5.2记忆模块 .......................................................................................312
9.5.3任务规划 .......................................................................................314
9.5.4动作模块 .......................................................................................317
9.5.5评估策略 .......................................................................................319
9.6多模态语言模型挑战 .................................................................................323
9.6.1技术问题 .......................................................................................323
9.6.2成本问题 .......................................................................................323
9.6.3社会问题 .......................................................................................324
第 10章大规模语言模型应用 ..............................................................................326
10.1法律领域 ................................................................................................328
10.1.1法律提示研究..............................................................................329
10.1.2法律综合评估..............................................................................332
10.2教育领域 ................................................................................................336
10.2.1能力评估 ....................................................................................336
10.2.2伦理问题 ....................................................................................340
10.2.3问答应用 ....................................................................................341
10.3金融领域 ................................................................................................342
10.3.1智能应用场景..............................................................................346
10.3.2困难和挑战 .................................................................................347
10.4生物医疗 ................................................................................................348
10.4.1潜力和价值 .................................................................................348
10.4.2应用的场景 .................................................................................351
10.4.3困难和挑战 .................................................................................355
10.5代码生成 ................................................................................................356
10.5.1代码生成问题..............................................................................356
10.5.2代码大规模语言模型....................................................................357
10.5.3发展趋势 ....................................................................................361
第 11章展望和结论 ...........................................................................................363
11.1局限和挑战 .............................................................................................363
11.1.1局限 ...........................................................................................363
11.1.2挑战 ...........................................................................................364
11.2方向和建议 .............................................................................................365
11.2.1数据方面 ....................................................................................365
11.2.2技术方面 ....................................................................................365
11.2.3应用方面 ....................................................................................366
11.2.4方向建议 ....................................................................................366
11.3值得探索的研究 ......................................................................................368
11.3.1基础理论研究..............................................................................369
11.3.2高效计算研究..............................................................................370
11.3.3安全伦理研究..............................................................................371
11.3.4数据和评估研究 ..........................................................................372
11.3.5认知学习问题..............................................................................373
11.3.6高效适配研究..............................................................................374
参考文献 ...............................................................................................................376 |