简介 |
![]()
内容推荐 强化学习(RL)的最新发展结合深度学习(DL),在训练代理以类似人的方式解决复杂问题方面取得了前所未有的进步。Google使用算法在著名的Atari街机游戏中获胜将该领域推至高峰,研究人员也在源源不断地产生新的想法。 本书是关于最新DL工具及其局限性的全面指南。在应用于真实环境之前,你得评估包括交叉熵和策略梯度在内的多种方法。试试Atari的虚拟游戏和像connect4这样的家庭最爱。本书介绍了RL的基础知识,为你提供了编写智能学习代理所需的原理,以承担一系列艰巨的实际任务。让你了解如何在“网格世界”环境中实现Q-learning,教你的代理购买和交易股票,发现自然语言模型如何推动了聊天机器人的火爆。 作者简介 马克西姆·拉潘(Maxim Lapan),is a deep learning enthusiast and independent researcher. His background and 15 years' work expertise as a software developer and a systems architect lays from low-level Linux kernel driver development to performance optimization and design of distributed applications working on thousands of servers. With vast work experiences in big data,Machine Learning, and large parallel distributed HPC and nonHPC systems, he has a talent to explain a gist of complicated things in simple words and vivid examples.His current areas of interest lie in practical applications of Deep Learning, such as Deep Natural Language Processing and Deep Reinforcement Learning. Maxim lives in Moscow, Russian Federation, with his family, and he works for an Israeli start-up as a Senior NLP developer. 目录 Preface Chapter 1: What is Reinforcement Learning? Learning - supervised, unsupervised, and reinforcement RL formalisms and relations Reward The agent The environment Actions Observations Markov decision processes Markov process Markov reward process Markov decision process Summary Chapter 2: OpenAI Gym The anatomy of the agent Hardware and software requirements OpenAI Gym API Action space Observation space The environment Creation of the environment The CartPole session The random CartPole agent The extra Gym functionality - wrappers and monitors Wrappers Monitor Summary Chapter 3: Deep Learning with PyTorch Tensors Creation of tensors Scalar tensors Tensor operations GPU tensors Gradients Tensors and gradients NN building blocks Custom layers Final glue - loss functions and optimizers Loss functions Optimizers Monitoring with TensorBoard TensorBoard 101 Plotting stuff Example -GAN on Atari images Summary Chapter 4: The Cross-Entropy Method Taxonomy of RL methods Practical cross-entropy Cross-entropy on CartPole Cross-entropy on FrozenLake Theoretical background of the cross-entropy method Summary Chapter 5: Tabular Learning and the Bellman Equation Value, state, and optimality The Bellman equation of optimality Value of action The value iteration method Value iteration in practice Q-learning for FrozenLake Summary Chapter 6: Deep Q-Networks Chapter 7: DQN Extensions Chapter 8: Stocks Trading Using RL Chapter 9: Policy Gradients - An Alternative Chapter 10: The Actor-Critic Method Chapter 11: Asynchronous Advantaqe Actor-Critic Chapter 12: Chatbots Training with RL Chapter 13: Web Navigation Chapter 14: Continuous Action Space Chapter 15: Trust Regions - TRPO, PPO, and ACKTR Chapter 16: Black-Box Optimization in RL Chapter 17: Beyond Model-Free - Imagination Chapter 18: AlphaGo Zero Other Books You May Enjoy Index
|