前言
上海交大动手学强化学习:中文版强化学习教程,简洁易懂,但是还不完善,完整版看其出版的书
Reinforcement Learning: An Introduction:英文版原著,比较完整,偏理论,配套代码相比上一个来说更为一般化。例如,多臂老虎机的非稳定性问题,即每个杆的奖励不是固定的以及其他讨论更为详细。
深度强化学习:待学习跟进
多臂老虎机 (Multi-Armed Bandit, MAB)
Code
1 | import numpy as np |
Results
1 | 随机生成了一个 10臂伯努利老虎机 |
马尔科夫决策过程 (Markov Decision Process)
Code
1 | import numpy as np |
Results
1 | 根据本序列计算得到回报为:-2.5 |
动态规划算法 (Dynamic Programming)
Code
1 | import copy |
Results
1 | D:\Anaconda3\envs\torch\python.exe D:/PythonProject/reinforcement_learning/03_Dynamic_Programming.py |
时序差分(Temporal Difference)算法
Code
1 | import matplotlib.pyplot as plt |
Results
1 | D:\Anaconda3\envs\torch\python.exe D:/PythonProject/reinforcement_learning/04_Temporal_Difference.py |
Dyna-Q算法
Code
1 | import matplotlib.pyplot as plt |
Results
1 | D:\Anaconda3\envs\torch\python.exe D:/PythonProject/reinforcement_learning/05_Dyna-Q.py |
DQN算法
Code
1 | import random |
rl_utils.py
1 | from tqdm import tqdm |
Results
1 | D:\Anaconda3\envs\torch\python.exe D:/PythonProject/reinforcement_learning/06_Deep_Q_Network.py |
DQN改进算法
Code
1 | import random |
Results
1 | C:\Users\98383\Anaconda3\envs\torch\python.exe D:/reinforcement_learning/07_Double_DQN.py |
策略梯度算法
Code
1 | import gym |
Results
1 | C:\Users\98383\Anaconda3\envs\torch\python.exe D:/reinforcement_learning/08_Reinforce.py |
Actor-Critic算法
Code
1 | import gym |
Results
1 | C:\Users\98383\Anaconda3\envs\torch\python.exe D:/reinforcement_learning/09_Actor_Critic.py |
TRPO算法
Code
1 | import torch |
Results
1 | C:\Users\98383\Anaconda3\envs\torch\python.exe D:/reinforcement_learning/10_True_Region_Policy_Optimization.py |