In Machine learning, we study how to make machines learn. First, we come up with fancy ideas based on our intuition on human cognition, then theorize them so that we could put them in equations, and make them definite; finally, we use bits and bytes to implement them. The intuition is to use the knowledge of how people think, learn, and extend it to machines. Interestingly, at times, I am learning about life from machines, well not exactly, learning interesting behavioral patterns from the algorithms.
Reinforcement learning deals with a unique problem-space. It does not have any examples to learn from, nor does it have a test set to verify. It has to simulate theories & policies, and follow the best myopically. All it has is a space to act upon, and the feedback on its actions to learn from. Based on the reward (or punishment) it adapts its behavior. Sounds simple right?
We, human, are by nature (or possibly by nurture) trained (or programmed) to behave the same way. But the most interesting point is, Positive Reinforcement, as it is called, is very effectively used by patrons – from parents on their children to bosses on their workers, to maximize the yield. For example, when a family, as a family, is happy, they would go on outing, enjoy watching TV, and children (and the other members of the family) would do all they could to enjoy those perks. When the situation is messy, possibly because of the unacceptable behavior of the children, the children are berated, punished, ignored, or simply made unhappy. Now they forfeit those perks. The extent of the unacceptability of their behaviors receives commensurate punishments. This, whether it is perceivable or not, cleverly tunes the children’s habits.
Now what if one side is made disproportionate? For example, what if the children are only rewarded? Knowing, that they will be rewarded on so called acceptable behaviors, they will try to maximize the rewards. But they are never punished. However, as time goes, you may have to increase the benefits to attract the same yield, i.e., the cost just inflates. At the same time, unacceptable behavior will become the norm. This offsets the balance. Simply the disproportionality is amplified, and a vicious cycle is formed.
Nevertheless, most of us conveniently overlook these two phenomena at the expense of the deterioration of relationships. For instance, in a relationship, if the “good” deeds are not welcomed, thanked and adequately encouraged with kind words, and “less desirable” acts are not properly discouraged, it empties the emotional bank account, makes them feel less appreciated, or make them expect more.
Ironically, we, even those who see and understand this pattern, wrongly assume that people are rational, they could see that such carrot/stick approaches are in fact conditioning them; we wrongly believe people will strive to be independent. However, in practice, very few reflect on what they do, let alone what others are doing to them. Most only react, and sometimes act, but seldom pro-act.
Perhaps, the lost wisdom can be regained from an unexpected place – machines, bits and bytes of the new world.
This entry was posted on Thursday, July 1st, 2010 at 8:19 pm
You can follow any responses to this entry through the RSS 2.0 feed.