Content Optimization with Multi-Armed Bandits & Python Video

0 ratings

Reinforcement learning deals with trial-and-error and searching for the best action to take, and is classified as a type of online learning, in contrast with offline learning, or batch learning. In online learning, you start with no knowledge, and you learn as you go, sequentially making optimal decisions. Hence bandits are perfect for recommendation engines when you know nothing about your users (the first day your app is up and running, a situation known as a cold start). Bandits balance exploring what you don’t know with exploiting what you know, a situation commonly referred to as the exploration-exploitation dilemma.

In this course, you will learn different strategies for balancing exploration and exploitation in order to learn the best action to take when you initially know nothing about the payoffs of the different actions. You will learn how to implement these algorithms, tune them, and incorporate them into various apps. In short, this course will give you the tools to make optimal decision in the face of uncertainty.

This download includes the video and also a text file with links to the Github repository and to the parameter tuning blog post the instructor mentions in the video.

I want this!