Monday, April 7th, 2014
It seems last post was found by @labview and brought much visiting here. So this project is still undergoing and I was kind of busy this month. But I will try finish this ASAP 🙂
Here are some updates since last post.
An Arduino controlled stylus now can be controlled by clicking the mouse.
There is a tip that the stylus needs to be earthed (via the red wire) to take affect.
And the Flappy Bird game is (poorly) simulated on LabVIEW. So I simulated FB (not facebook!) on LabVIEW to test the Q-learning algorithm in a purely software environment. Now it can pass 5 gaps 🙂 The algorithm needs to be altered a bit and then I will integrate the whole things.
Friday, March 21st, 2014
So this is the story: Flappy Bird was so popular that my friend suggested that we should develop a LabVIEW kit with a motor to play it. Two days later, we found Sarvagya Vaish managed to score 1000 by applying Q-learning algorithm. A couple of days later, a studio used arduino to play the game. Hmm…I will finish my work anyway.
That’s where I learned about the Q-learning, one of the reinforcement learning algorithm. Here is a brief tutorial helped me to have a better understanding of it. So if a goal is achieved by multiple steps, this algorithm grades each step by assigning a reward to it. Each step, or action, is not graded right away, but one step later. In this way the “right” action can be determined by the reward it received.
The equation can be described as
Q'(s, a) = (1 – alpha)*Q(s, a) + alpha*(R(s, a) + Gamma * Max[Q(s’, all a’)])
Where Q (accumulative experience) is a table of s (state) and a (action), s’ is the next state and a’ is the next action. alpha is the step size and Gamma is the discount reward. I tried to google a Q-learning example in LabVIEW but failed. So I created this vi myself and hope it can be useful to someone.
This is a single loop vi and the shift register stores the value for Q. The reset button is to initialize Q’s value and can be replaced by “first call?” node. The user shall build their own “Reward” vi according to their applications. In this vi the next action is determined by the Q value that rewards the most but it can also be a random action (or other methods).
Please find the tutorial links 1 2 for more information about Q-learning algorithm.