So this is the story: Flappy Bird was so popular that my friend suggested that we should develop a LabVIEW kit with a motor to play it. Two days later, we found Sarvagya Vaish managed to score 1000 by applying Q-learning algorithm. A couple of days later, a studio used arduino to play the game. Hmm…I will finish my work anyway.
That’s where I learned about the Q-learning, one of the reinforcement learning algorithm. Here is a brief tutorial helped me to have a better understanding of it. So if a goal is achieved by multiple steps, this algorithm grades each step by assigning a reward to it. Each step, or action, is not graded right away, but one step later. In this way the “right” action can be determined by the reward it received.
The equation can be described as
Q'(s, a) = (1 – alpha)*Q(s, a) + alpha*(R(s, a) + Gamma * Max[Q(s’, all a’)])
Where Q (accumulative experience) is a table of s (state) and a (action), s’ is the next state and a’ is the next action. alpha is the step size and Gamma is the discount reward. I tried to google a Q-learning example in LabVIEW but failed. So I created this vi myself and hope it can be useful to someone.
This is a single loop vi and the shift register stores the value for Q. The reset button is to initialize Q’s value and can be replaced by “first call?” node. The user shall build their own “Reward” vi according to their applications. In this vi the next action is determined by the Q value that rewards the most but it can also be a random action (or other methods).