The Exploration/Exploitation Framework

We all have a variety of mental models that we use to interpret the world around us. In many cases we have very specific models, e.g. I know that turning the key in the ignition makes my car turn on, and this does not assist me very much in my understanding of the world. At the same time, the concept of different keys fitting into different locks is a metaphor that we apply to other areas of life.

One of my favorite models comes from reinforcement learning, which is particularly applicable to how our brain functions. In the most general case, assume that you have many different options, and each of these gives you a payoff randomly selected from an unknown distribution. Your goal is to maximize the payoff you receive over a fixed time horizon. This type of game is epitomized by the multi-armed bandit problem, or the Wisconsin Card Sorting Test when that distribution changes over time.

So how do you optimize these types of tasks? [Read more…]