WHAT IS REINFORCEMENT LEARNING?
Reinforcement learning is one of the most active research areas in Artificial Intelligence. Reinforcement learning is training by rewards and punishments. Here we train a computer as if we train a dog. If the dog obeys and acts according to our instructions we encourage it by giving biscuits or we punish it by beating or by scolding. Similarly, if the system works well then the teacher gives positive value (i.e. reward) or the teacher gives negative value (i.e. punishment). The learning system which gets the punishment has to improve itself. Thus it is a trial and error process.
The reinforcement learning algorithms selectively retain the outputs that maximize the received reward over time. To accumulate a lot of rewards, the learning system must prefer the best experienced actions; however, it has to try new actions in order to discover better action selections for the future.
TEMPORAL DIFFERENCE LEARNING
Temporal difference learning is a central idea to reinforcement learning. It is based on Monte Carlo methods and dynamic programming. It is an unsupervised technique. Temporal difference learning methods can learn directly from raw experience without a model of the environment’s dynamics. Examples are learning to play games, robot control, elevator control, network routing and animal learning.
APPLICATIONS OF REINFORCEMENT LEARNING
Personalization Travel Support System is one software that is designed to provide travelling information as per the user’s interests. It applies the reinforcement learning to analyze and learn customer behaviors and list out the products that the customers wish to buy. If the system selects the right item that the customer wish to buy then it is given reward by assigning a particular value for the state that a user selects to perform and if the system selects an item which the user does not wish to buy then it is given the penalty. This way the system learns the personal interests. In this process, the system acquires the knowledge of the user behavior and interest which makes it decide which information should be given to a particular user. This results in greater customer satisfaction and increase in the success rate of product promotion.