![]()
内容推荐 " 本书是大数据应用人才培养系列教材中的一册,讲解了大数据系统运行维护过程中的各个主要阶 段及其任务,包括配置管理、基础运维管理、故障管理、性能管理、安全管理、高可用性管理、变更 及升级管理、运维场景应用及服务资源管理,内容全面且翔实,兼具基础理论知识与运维实践经验, 特别是重点介绍了大数据系统的运维特点及运维技能,从而可以保障大数据系统的稳定可靠运行,更 好地支撑大数据的商业应用价值。 本书在继承第 1 版基础的同时,巧妙融合了**的运维方式和经验,构建出更加全面、深入的知 识体系。第 2 版的特色体现在对日志排查的精妙思路,系统变更升级的成功经验以及云原生环境下的 运维应用等领域的深度拓展,为运维工程师提供了丰富而实用的指导。通过这本书,读者将深入洞察 当今大数据系统运维的精髓,从而提升自身实践技能,驾驭运维工作的新高度。 本书具有很强的系统性和实践指导性,可以作为培养应用型人才的课程教材,也可以作为从事 IT 系统运维工作的广大从业者和爱好者的参考用书。" 目录 目 录 第 1 章 配置管理 1.1 配置管理内容 ················································································ 2 1.1.1 配置管理术语定义 ······································································· 2 1.1.2 应用软件配置 ············································································· 3 1.1.3 硬件配置 ··················································································· 3 1.2 配置管理方法 ················································································ 7 1.2.1 配置流程 ··················································································· 7 1.2.2 配置自动发现 ··········································································· 11 1.3 配置管理工具 ·············································································· 11 1.3.1 CMDB 数据库介绍与实践 ···························································· 11 1.3.2 自动配置工具 ··········································································· 14 1.3.3 云时代下的 CMDB ····································································· 24 1.4 其他运维工具 ·············································································· 24 1.4.1 Ambari ···················································································· 24 1.4.2 CLI 工具 ·················································································· 26 1.4.3 Ganglia ···················································································· 27 1.4.4 Cloudera Manager ······································································· 28 1.4.5 其他工具 ················································································· 31 1.5 作业与练习 ················································································· 32 参考文献 ··························································································· 32 第 2 章 基础运维管理 2.1 系统建设 ···················································································· 33 2.1.1 技术方案 ················································································· 34 2.1.2 部署实施 ················································································· 35 2.1.3 测试验收 ················································································· 39 2.2 系统管理对象 ·············································································· 40 2.2.1 系统管理对象 ··········································································· 40 2.2.2 系统软件 ················································································· 40 2.2.3 系统硬件 ················································································· 42 2.2.4 系统数据 ················································································· 43 2.2.5 IT 供应商 ················································································· 43 2.3 系统管理内容 ·············································································· 44 2.3.1 事件管理 ················································································· 45 2.3.2 问题管理 ················································································· 45 2.3.3 配置管理 ················································································· 46 2.3.4 变更管理 ················································································· 46 2.3.5 发布管理 ················································································· 47 2.3.6 知识管理 ················································································· 47 2.3.7 日志管理 ················································································· 48 2.3.8 备份管理 ················································································· 48 2.4 系统管理工具 ·············································································· 49 2.4.1 资产管理 ················································································· 49 2.4.2 监控管理 ················································································· 49 2.4.3 流程管理 ················································································· 50 2.4.4 外包管理 ················································································· 51 2.5 系统管理制度规范 ········································································ 51 2.5.1 系统管理标准 ··········································································· 51 2.5.2 系统管理制度 ··········································································· 51 2.5.3 系统管理规范 ··········································································· 52 2.6 日常巡检 ···················································································· 52 2.6.1 检查内容分类 ··········································································· 52 2.6.2 巡检方法分类 ··········································································· 53 2.6.3 巡检流程 ················································································· 54 2.7 日志管理 ······················································································ 54 2.7.1 平台及组件相关命令 ·································································· 55 2.7.2 日志和告警监控 ········································································ 62 2.8 作业与练习 ················································································· 67 参考文献 ··························································································· 68 第 3 章 故障管理 3.1 集群结构 ···················································································· 69 3.2 故障报告 ···················································································· 70 3.2.1 故障发现 ················································································· 70 3.2.2 影响分析 ················································································· 71 3.3 故障处理 ···················································································· 72 3.3.1 故障诊断 ················································································· 72 3.3.2 故障排除 ················································································· 73 3.4 故障后期管理 ·············································································· 74 3.4.1 建立和更新知识库 ····································································· 74 3.4.2 故障预防 ················································································· 74 3.5 作业与练习 ················································································· 75 参考文献 ··························································································· 75 第 4 章 性能管理 4.1 性能分析 ···················································································· 76 4.1.1 性能因子 ················································································· 76 4.1.2 性能指标 ················································································· 77 4.2 性能监控工具 ·············································································· 78 4.2.1 GUI ························································································ 79 4.2.2 集群 CLI ·················································································· 82 4.2.3 操作系统自带工具 ····································································· 87 4.2.4 Ganglia ···················································································· 92 4.2.5 其他监控工具 ··········································································· 95 4.3 性能优化 ···················································································· 95 4.3.1 Hadoop 集群配置规划优化 ··························································· 95 4.3.2 Hadoop 性能优化 ······································································· 96 4.3.3 作业优化 ················································································ 100 4.4 作业与练习 ··············································································· 108 参考文献 ························································································· 108 第 5 章 安全管理 5.1 安全概述 ·················································································· 109 5.2 资产安全管理 ············································································· 110 5.2.1 环境设施安全 ·········································································· 110 5.2.2 设备安全 ················································································ 110 5.3 应用安全 ··················································································· 111 5.3.1 技术安全 ················································································ 111 5.3.2 数据安全 ················································································ 114 5.4 安全威胁 ··················································································· 115 5.4.1 人为失误 ················································································ 115 5.4.2 外部攻击 ················································································ 116 5.4.3 信息泄密 ················································································ 122 5.4.4 灾害 ······················································································ 122 5.5 安全措施 ·················································································· 123 5.5.1 安全制度规范 ·········································································· 123 5.5.2 安全防范措施 ·········································································· 123 5.6 作业与练习 ··············································································· 124 参考文献 ························································································· 124 第 6 章 高可用性管理 6.1 高可用性概述 ············································································ 125 6.2 高可用性技术 ············································································ 126 6.2.1 系统架构 ················································································ 126 6.2.2 容灾 ······················································································ 128 6.2.3 监控 ······················································································ 128 6.2.4 故障转移 ················································································ 134 6.3 业务连续性管理 ········································································· 134 6.3.1 灾备系统 ················································································ 134 6.3.2 应急预案 ················································································ 138 6.3.3 日常演练 ················································································ 138 6.4 作业与练习 ··············································································· 139 参考文献 ························································································· 139 第 7 章 变更及升级管理 7.1 变更管理概述 ············································································ 140 7.1.1 变更管理目标 ·········································································· 140 7.1.2 变更管理范围 ·········································································· 140 7.1.3 变更管理的种类 ······································································· 140 7.1.4 变更管理的原则 ······································································· 141 7.2 变更管理流程 ············································································ 141 7.2.1 变更的组织架构 ······································································· 141 7.2.2 变更的管理策略 ······································································· 141 7.2.3 变更的流程控制 ······································································· 142 7.2.4 变更管理流程 ·········································································· 142 7.3 变更配置管理 ············································································ 144 7.4 通用系统升级流程 ······································································ 144 7.4.1 业务数据集环境备份 ································································· 144 7.4.2 系统升级部署的常用策略(蓝绿/滚动/灰度) ·································· 145 7.4.3 业务服务验证 ·········································································· 146 7.4.4 数据割接与用户割接 ································································· 152 7.4.5 回滚策略 ················································································ 155 7.5 作业与练习 ··············································································· 156 参考文献 ························································································· 156 第 8 章 运维场景应用 8.1 运维场景描述 ············································································ 157 8.2 运维应用版本升级 ······································································ 158 8.2.1 Hadoop 升级管理 ······································································ 158 8.2.2 Spark 升级管理 ········································································· 159 8.2.3 Hive SQL 升级管理 ··································································· 161 8.2.4 ZooKeeper 升级管理 ·································································· 163 8.3 微服务与容器虚拟化 ··································································· 165 8.3.1 业务应用容器化—Docker ························································· 165 8.3.2 容器的集群化管理与编排—k8s ················································· 169 8.3.3 微服务监控与服务追踪 ······························································ 177 8.4 云原生运维 ··············································································· 178 8.4.1 持续集成与持续交付 ································································· 178 8.4.2 Jenkins 流水线·········································································· 179 8.4.3 自动化持续部署 ······································································· 180 8.4.4 服务的注册与发现 ···································································· 181 8.4.5 服务的熔断与限流 ···································································· 182 8.5 作业与练习 ··············································································· 183 参考文献 ························································································· 183 第 9 章 服务资源管理 9.1 业务能力管理 ············································································ 185 9.1.1 业务需求评估 ·········································································· 185 9.1.2 业务需求趋势预测 ···································································· 186 9.2 服务能力管理 ············································································ 187 9.2.1 人员能力动态管理 ···································································· 187 9.2.2 服务成本动态管理 ···································································· 189 9.2.3 技术与工具管理 ······································································· 190 9.3 服务资源整合 ············································································ 190 9.3.1 不同角色的责权划分 ································································· 190 9.3.2 用户、供应商、厂商的典型协作方式 ············································ 192 9.4 作业与练习 ··············································································· 193 参考文献 ························································································· 194 附录 A 大数据和人工智能实验环境 附录 B Hadoop 环境要求 附录 C 名词解释 |