目录

简介

  • 1.1 为什么用强化学习 Why?
  • 1.2 课程要求

    Q-learning

  • 2.1 小例子
  • 2.2 Q-learning 算法更新
  • 2.3 Q-learning 思维决策

    Sarsa

  • 3.1 Sarsa 算法更新
  • 3.2 Sarsa 思维决策
  • 3.3 Sarsa-lambda

    Deep Q Network

  • 4.1 DQN 算法更新 (Tensorflow)
  • 4.2 DQN 神经网络 (Tensorflow)
  • 4.3 DQN 思维决策 (Tensorflow)
  • 4.4 OpenAI gym 环境库
  • 4.5 Double DQN (Tensorflow)
  • 4.6 Prioritized Experience Replay (DQN) (Tensorflow)
  • 4.7 Dueling DQN (Tensorflow)

    Policy Gradient

  • 5.1 Policy Gradients 算法更新 (Tensorflow)
  • 5.2 Policy Gradients 思维决策 (Tensorflow)

    Actor Critic

  • 6.1 Actor Critic (Tensorflow)
  • 6.2 Deep Deterministic Policy Gradient (DDPG) (Tensorflow)
  • 6.3 Asynchronous Advantage Actor-Critic (A3C) (Tensorflow)
  • 6.4 Distributed Proximal Policy Optimization (DPPO) (Tensorflow)

results matching ""

    No results matching ""