> For the complete documentation index, see [llms.txt](https://luweikxy.gitbook.io/machine-learning-notes/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://luweikxy.gitbook.io/machine-learning-notes/advanced-knowledge/reinforcement-learning.md).

# 强化学习

## reinforcement-learning

* [返回顶层目录](https://github.com/luweikxy/machine-learning-notes/tree/b36b0ea26186770feceee2ee477b6b55a14c1488/SUMMARY.md)
* [返回上层目录](/machine-learning-notes/advanced-knowledge.md)
* [DRN-A-Deep-Reinforcement-Learning-Framework-for-News-Recommendation](/machine-learning-notes/advanced-knowledge/reinforcement-learning/drn-a-deep-reinforcement-learning-framework-for-news-recommendation.md)

## 为什么要将强化学习用在推荐系统上

作为一个**千亿级数据量**的从业者，我讲讲我认为推荐系统中**最重要的几点**，可能与其他回答都略有不同

1. **不同规模下的工程架构：**&#x7279;征从**百**到**百万**到**百亿**，不同级别的工程架构相差极大
2. **对目标的选定：**&#x5982;何选择你的目标，决定了怎么做画像、特征，改变一个目标非常的伤筋动骨，而且也无法说清目标的制定是否科学
3. **对长期目标的学习：**&#x77ED;期的目标可以是一跳（用户的单次成本，付费或者消费），但长期的目标一定是用户付出的长期成本（长期消费，用户粘性），怎么去学习，是非常困难的事情。很多公司、学校都在进行这方面的研究（1、2、3），可以参考

这几个点很难绕过，未来几年也会成为各家推荐的差异点。核心技术说实话大家都非常清楚，Wide & Deep已经应用的非常广泛，这剩余的核心问题就看谁能够解决的足够快、跑的足够前面了。

## 参考文献

* [推荐系统有哪些坑？-Geek An](https://www.zhihu.com/question/28247353/answer/399162539)

"为什么要将强化学习用在推荐系统上"一节参考了此回答。

\===

[增强学习在推荐系统有什么最新进展？](https://www.zhihu.com/question/57388498/answer/570874226)

\[1] Dulac-Arnold G, Evans R, van Hasselt H, et al. Deep reinforcement learning in large discrete action spaces\[J]. arXiv preprint arXiv:1512.07679, 2015.

\[2] Liebman E, Saar-Tsechansky M, Stone P. Dj-mc: A reinforcement-learning agent for music playlist recommendation\[C]//Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2015: 591-599.

\[3] Zheng G, Zhang F, Zheng Z, et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation\[C]//Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2018: 167-176.

\[4] Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, Dawei Yin: [Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems](http://export.arxiv.org/abs/1902.05570)\[C]KDD 2019

清华大学和京东发表于 KDD 2019 的全新强化学习框架 FeedRec

\[5] Youtube RL Recommendation: Top-k Off-Policy Correction for a REINFORCE Recommender System , Google, WSDM, 2019