策略前展策略迭代与分布式强化学习(国际知名大学原版教材)(英文版)/信息技术学科与电气工程学科系列(美)德梅萃·P.博赛卡斯清华大学出版社豆瓣PDF电子书bt网盘迅雷下载-霍普软件下载网

本书主要内容：第1章为动态规划原理；第2章为策略前展与策略改进；第3章为专用策略前展算法；第4章为值和策略的学习；第5章为无限时间分布式和多智能体算法。
横空出世的围棋软件AlphaZero算法对本书有很大影响。本书内容同样基于策略迭代、值网络和策略网络的神经网络近似表示、并行与分布式计算和前瞻最小化约简技术的核心框架构建，并对算法的适用范围做了拓展。本书的特色在于给出了分布式计算和多智能体系统框架下的强化学习策略改进计算的效率提升技术，建立了一步策略改进策略前展方法同控制系统中广泛使用的模型预测控制（MPC）设计方法之间的联系，并描述了策略前展方法在复杂离散和组合优化问题方面的应用。
通过阅读本书，读者可以了解强化学习中的策略迭代，特别是策略前展方法在分布式和多智能体框架下的最新进展和应用。本书可用作人工智能或系统与控制科学等相关专业的高年级本科生或研究生的教材，也适合开展相关研究工作的专业技术人员作为参考书。

1 Exact and Approximate Dynamic Programming Principles
1.1 AlphaZero, Off-Line Training, and On-Line Play
1.2 Deterministic Dynamic Programming
1.2.1 Finite Horizon Problem Formulation
1.2.2 The Dynamic Programming Algorithm
1.2.3 Approximation in Value Space
1.3 Stochastic Dynamic Programming
1.3.1 Finite Horizon Problems
1.3.2 Approximation in Value Space for Stochastic DP
1.3.3 Infinite Horizon Problems-An Overview
1.3.4 Infinite Horizon-Approximation in Value Space
1.3.5 Infinite Horizon-Policy Iteration, Rollout, andNewton's Method
1.4 Examples, Variations, and Simplifications
1.4.1 A Few Words About Modeling
1.4.2 Problems with a Termination State
1.4.3 State Augmentation, Time Delays, Forecasts, and Uncontrollable State Components
1.4.4 Partial State Information and Belief States
1.4.5 Multiagent Problems and Multiagent Rollout
1.4.6 Problems with Unknown Parameters-AdaptiveControl
1.4.7 Adaptive Control by Rollout and On-LineReplanning
1.5 Reinforcement Learning and Optimal Control-SomeTerminology
1.6 Notes and Sources
2 General Principles of Approximation in Value Space
2.1 Approximation in Value and Policy Space
2.1.1 Approximation in Value Space-One-Step and Multistep Lookahead
2.1.2 Approximation in Policy Space
2.1.3 Combined Approximation in Value and Policy Space
2.2 Approaches for Value Space Approximation
2.2.1 Off-Line and On-Line Implementations
2.2.2 Model-Based and Model-Free Implementations
2.2.3 Methods for Cost-to-Go Approximation
2.2.4 Methods for Expediting the Lookahead Minimization
2.3 Deterministic Rollout and the Policy Improvement Principle
2.3.1 On-Line Rollout for Deterministic Discrete Optimization
2.3.2 Using Multiple Base Heuristics-Parallel Rollout
2.3.3 The Simplified Rollout Algorithm
2.3.4 The Fortified Rollout Algorithm
2.3.5 Rollout with Multistep Lookahead
2.3.6 Rollout with an Expert
2.3.7 Rollout with Small Stage Costs and Long Horizon-Continuous-Time Rollout
2.4 Stochastic Rollout and Monte Carlo Tree Search
2.4.1 Simulation-Based Implementation of the Rollout Algorithm
2.4.2 Monte Carlo Tree Search
2.4.3 Randomized Policy Improvement by Monte Carlo Tree Search
2.4.4 The Effect of Errors in Rollout-Variance Reduction
2.4.5 Rollout Parallelization
2.5 Rollout for Infinite-Spaces Problems-Optimization Heuristics
2.5.1 Rollout for Infinite-Spaces Deterministic Problems
2.5.2 Rollout Based on Stochastic Programming
2.6 Notes and Sources
3 Specialized Rollout Algorithms
3.1 Model Predictive Control
3.1.1 Target Tubes and Constrained Controllability
3.1.2 Model Predictive Control with Terminal Cost
3.1.3 Variants of Model Predictive Control
3.1.4 Target Tubes and State-Constrained Rollout
3.2 Multiagent Rollout
3.2.1 Asynchronous and Autonomous Multiagent Rollout
3.2.2 Multiagent Coupling Through Constraints
3.2.3 Multiagent Model Predictive Control
3.2.4 Separable and Multiarmed Bandit Problems
3.3 Constrained Rollout-Deterministic Optimal Control
3.3.1 Sequential Consistency, Sequential Improvement, and the Cost Improvement Property
3.3.2 The Fortified Rollout Algorithm and Other Variations
3.4 Constrained Rollout-Discrete Optimization
3.4.1 General Discrete Optimization Problems
3.4.2 Multidimensional Assignment
3.5 Rollout for Surrogate Dynamic Programming and Bayesian Optimization
3.6 Rollout for Minimax Control
3.7 Notes and Sources
4 Learning Values and Policies
4.1 Parametric Approximation Architectures
4.1.1 Cost Function Approximation
4.1.2 Feature-Based Architectures
4.1.3 Training of Linear and Nonlinear Architectures
4.2 Neural Networks
4.2.1 Training of Neural Networks
4.2

书名	策略前展策略迭代与分布式强化学习(国际知名大学原版教材)(英文版)/信息技术学科与电气工程学科系列
分类
作者	(美)德梅萃·P.博赛卡斯
出版社	清华大学出版社
下载
简介	内容推荐本书主要内容：第1章为动态规划原理；第2章为策略前展与策略改进；第3章为专用策略前展算法；第4章为值和策略的学习；第5章为无限时间分布式和多智能体算法。横空出世的围棋软件AlphaZero算法对本书有很大影响。本书内容同样基于策略迭代、值网络和策略网络的神经网络近似表示、并行与分布式计算和前瞻最小化约简技术的核心框架构建，并对算法的适用范围做了拓展。本书的特色在于给出了分布式计算和多智能体系统框架下的强化学习策略改进计算的效率提升技术，建立了一步策略改进策略前展方法同控制系统中广泛使用的模型预测控制（MPC）设计方法之间的联系，并描述了策略前展方法在复杂离散和组合优化问题方面的应用。通过阅读本书，读者可以了解强化学习中的策略迭代，特别是策略前展方法在分布式和多智能体框架下的最新进展和应用。本书可用作人工智能或系统与控制科学等相关专业的高年级本科生或研究生的教材，也适合开展相关研究工作的专业技术人员作为参考书。目录 1 Exact and Approximate Dynamic Programming Principles 1.1 AlphaZero, Off-Line Training, and On-Line Play 1.2 Deterministic Dynamic Programming 1.2.1 Finite Horizon Problem Formulation 1.2.2 The Dynamic Programming Algorithm 1.2.3 Approximation in Value Space 1.3 Stochastic Dynamic Programming 1.3.1 Finite Horizon Problems 1.3.2 Approximation in Value Space for Stochastic DP 1.3.3 Infinite Horizon Problems-An Overview 1.3.4 Infinite Horizon-Approximation in Value Space 1.3.5 Infinite Horizon-Policy Iteration, Rollout, andNewton's Method 1.4 Examples, Variations, and Simplifications 1.4.1 A Few Words About Modeling 1.4.2 Problems with a Termination State 1.4.3 State Augmentation, Time Delays, Forecasts, and Uncontrollable State Components 1.4.4 Partial State Information and Belief States 1.4.5 Multiagent Problems and Multiagent Rollout 1.4.6 Problems with Unknown Parameters-AdaptiveControl 1.4.7 Adaptive Control by Rollout and On-LineReplanning 1.5 Reinforcement Learning and Optimal Control-SomeTerminology 1.6 Notes and Sources 2 General Principles of Approximation in Value Space 2.1 Approximation in Value and Policy Space 2.1.1 Approximation in Value Space-One-Step and Multistep Lookahead 2.1.2 Approximation in Policy Space 2.1.3 Combined Approximation in Value and Policy Space 2.2 Approaches for Value Space Approximation 2.2.1 Off-Line and On-Line Implementations 2.2.2 Model-Based and Model-Free Implementations 2.2.3 Methods for Cost-to-Go Approximation 2.2.4 Methods for Expediting the Lookahead Minimization 2.3 Deterministic Rollout and the Policy Improvement Principle 2.3.1 On-Line Rollout for Deterministic Discrete Optimization 2.3.2 Using Multiple Base Heuristics-Parallel Rollout 2.3.3 The Simplified Rollout Algorithm 2.3.4 The Fortified Rollout Algorithm 2.3.5 Rollout with Multistep Lookahead 2.3.6 Rollout with an Expert 2.3.7 Rollout with Small Stage Costs and Long Horizon-Continuous-Time Rollout 2.4 Stochastic Rollout and Monte Carlo Tree Search 2.4.1 Simulation-Based Implementation of the Rollout Algorithm 2.4.2 Monte Carlo Tree Search 2.4.3 Randomized Policy Improvement by Monte Carlo Tree Search 2.4.4 The Effect of Errors in Rollout-Variance Reduction 2.4.5 Rollout Parallelization 2.5 Rollout for Infinite-Spaces Problems-Optimization Heuristics 2.5.1 Rollout for Infinite-Spaces Deterministic Problems 2.5.2 Rollout Based on Stochastic Programming 2.6 Notes and Sources 3 Specialized Rollout Algorithms 3.1 Model Predictive Control 3.1.1 Target Tubes and Constrained Controllability 3.1.2 Model Predictive Control with Terminal Cost 3.1.3 Variants of Model Predictive Control 3.1.4 Target Tubes and State-Constrained Rollout 3.2 Multiagent Rollout 3.2.1 Asynchronous and Autonomous Multiagent Rollout 3.2.2 Multiagent Coupling Through Constraints 3.2.3 Multiagent Model Predictive Control 3.2.4 Separable and Multiarmed Bandit Problems 3.3 Constrained Rollout-Deterministic Optimal Control 3.3.1 Sequential Consistency, Sequential Improvement, and the Cost Improvement Property 3.3.2 The Fortified Rollout Algorithm and Other Variations 3.4 Constrained Rollout-Discrete Optimization 3.4.1 General Discrete Optimization Problems 3.4.2 Multidimensional Assignment 3.5 Rollout for Surrogate Dynamic Programming and Bayesian Optimization 3.6 Rollout for Minimax Control 3.7 Notes and Sources 4 Learning Values and Policies 4.1 Parametric Approximation Architectures 4.1.1 Cost Function Approximation 4.1.2 Feature-Based Architectures 4.1.3 Training of Linear and Nonlinear Architectures 4.2 Neural Networks 4.2.1 Training of Neural Networks 4.2
随便看	电子商务网站建设与综合实践/国家题库技能实训指导丛书管理学原理(全国高等职业教育经济管理专业精品系列教材) 中学数字化校园的建设与应用旅游市场营销(高职高专旅游专业精品教材) 国际学前教育法律研究教学技能导论(教师教育通识系列教材) 幼儿音乐活动设计与指导米丽深陷怀疑/米丽成长系列视频编辑设计与制作系列--Premiere+Vegas(创意设计系列教材) 信用担保理论实践与创新笑对压力不言愁(中英文双语版)/女孩成长攻略追逐梦想我从容(中英文双语版)/男孩成长攻略守望国境线上的家园--金平傣族的社会文化/社会文化人类学丛书一本书读懂魏晋南北朝汉书(共4册)(精)/中华国学文库大学条解后汉书(共4册)(精)/中华国学文库增订文心雕龙校注(精)/中华国学文库诸葛亮集(精)/中华国学文库周易注校释(精)/中华国学文库荀子集解(精)/中华国学文库曹操集(精)/中华国学文库坛经校释(精)/中华国学文库红楼梦(共4册超值典藏版)/经典国学系列丛书三国演义(共4册超值典藏版)/经典国学系列丛书变更管理制度路上读书安心陪诊工程项目管理制度天天健康抗菌药物分级管理制度建设项目管理制度美伽汇售后服务管理制度商标查询注册圣歌救赎墓园13被诅咒的面具深海迷航冰点之下豆腐脑模拟器 Jump大乱斗 Outpost Zero Apex英雄运输服务生化危机2重制版探灵笔记 the Beat Generation the Beatitudes the Beaufort scale the beau monde the Beeb the Bible Belt the Big Apple the big bang the Big Board the big boys [BT下载][风云变][第07集][WEB-MP4/0.29G][国语配音/中文字幕][1080P][H265][流媒体][ZeroTV] 剧集 2024 大陆动画连载 [BT下载][再见已是白月光][全21集][WEB-MKV/8.41G][国语音轨/简繁英字幕][1080P][流媒体][ColorWEB] 剧集 2024 大陆爱情打包 [BT下载][墨雨云间][第35集][WEB-MKV/1.25G][国语配音/中文字幕][4K-2160P][H265][流媒体][ColorWEB] 剧集 2024 大陆剧情连载 [BT下载][我的少年时代][全21集][WEB-MKV/9.83G][国语音轨/简繁英字幕][1080P][流媒体][ColorWEB] 剧集 2024 大陆剧情打包 [BT下载][我的萌宠恋人][全24集][WEB-MKV/3.62G][国语音轨/简繁英字幕][1080P][流媒体][ColorWEB] 剧集 2024 大陆剧情打包 [BT下载][晓朝夕][第12集][WEB-MKV/0.96G][国语配音/中文字幕][4K-2160P][H265][流媒体][ColorWEB] 剧集 2024 大陆剧情连载 [BT下载][烈焰][全40集][WEB-MKV/29.85G][国语音轨/简繁英字幕][1080P][流媒体][ColorWEB] 剧集 2024 大陆剧情打包 [BT下载][离歌][全24集][WEB-MKV/10.07G][国语配音/中文字幕][1080P][流媒体][ColorWEB] 剧集 2024 大陆爱情打包 [BT下载][金庸武侠世界][第11集][WEB-MKV/1.13G][国语配音/中文字幕][4K-2160P][H265][流媒体][ColorWEB] 剧集 2024 大陆剧情连载 [BT下载][王牌酒保神之酒杯][第12集][WEB-MKV/1.35G][简繁英字幕][1080P][流媒体][ZeroTV] 剧集 2024 日本动画连载捷速PDF编辑器修改PDF文件文字的教程 excel函数left与right的使用教程下载microsoft office2016预览版的图文教程用CAJ阅读器为caj文件做颜色标记的方法用word制作电子公章的步骤 pdf转换成excel小技巧用Excel制作南丁格尔玫瑰图的图文教程 Excel快速简体转换繁体技巧 PowerPoint2010教程之创建视频快速安装Office 2003办公软件的方法