Positive Reinforcement
6 years, 10 months ago Posted in: Blog, Thoughts 0

In Machine learning, we study how to make machines learn. First, we come up with fancy ideas based on our intuition on , then theorize them so that we could put them in equations, and make them definite; finally, we use bits and bytes to implement them. The intuition is to use the knowledge of how people think, learn, and extend it to machines. Interestingly, at times, I am learning about from machines, well not exactly, learning interesting from the algorithms.

Reinforcement learning deals with a unique problem-space. It does not have any examples to learn from, nor does it have a test set to verify. It has to simulate theories & policies, and follow the best myopically. All it has is a space to act upon, and the feedback on its actions to learn from. Based on the reward (or punishment) it adapts its behavior. Sounds simple right?

We, human, are by nature (or possibly by nurture) trained (or programmed) to behave the same way. But the most interesting point is, , as it is called, is very effectively used by patrons – from parents on their children to bosses on their workers, to maximize the yield. For example, when a family, as a family, is happy, they would go on outing, enjoy watching TV, and children (and the other members of the family) would do all they could to enjoy those perks. When the situation is messy, possibly because of the unacceptable behavior of the children, the children are berated, punished, ignored, or simply made unhappy. Now they forfeit those perks. The extent of the unacceptability of their behaviors receives commensurate . This, whether it is perceivable or not, cleverly tunes the children’s habits.

Now what if one side is made disproportionate? For example, what if the children are only rewarded? Knowing, that they will be rewarded on so called , they will try to maximize the . But they are never punished. However, as time goes, you may have to increase the benefits to attract the same yield, i.e., the cost just inflates. At the same time, unacceptable behavior will become the norm. This offsets the balance. Simply the disproportionality is amplified, and a vicious cycle is formed.

Nevertheless, most of us conveniently overlook these two phenomena at the expense of the deterioration of relationships. For instance, in a relationship, if the “good” deeds are not welcomed, thanked and adequately encouraged with kind words, and “less desirable” acts are not properly discouraged, it empties the emotional bank account, makes them feel less appreciated, or make them expect more.

Ironically, we, even those who see and understand this pattern, wrongly assume that people are rational, they could see that such carrot/stick approaches are in fact conditioning them; we wrongly believe people will strive to be independent. However, in practice, very few reflect on what they do, let alone what others are doing to them. Most only react, and sometimes act, but seldom pro-act.

Perhaps, the lost wisdom can be regained from an unexpected place – machines, bits and bytes of the new world.

Related Posts

Leave a Reply