Reinforcement Learning Basketball

I took a class during the spring of my sophomore year at Rutgers learning entitled Robot Learning Seminars. It was a graduate class in the computer science department (at the time i was an undergraduate in mechanical engineering). I thought the class was cool so i emailed the professor and for some unknown reason he let me take it. And I can easily say this is the most interesting and fascinating class I have ever taken up to now. Essentially it was a class on reinforcement learning, but specifically the applications in robotics. We discussed many methods of RL, but for one of our assignments we were told to teach a robot arm to shoot basketballs into a hoop using a couple assigned methods. We used Q-tables (not recommended for real use but was a good exercise in understanding the concepts as well as how different tuning parameters work). We also used NFQ a deepish learning variant which was also super cool. There wasn’t a huge amount of time to work on this project so it’s not my best work, but i wanted to put it here just so anyone can check it out if they want. Here’s the Github

basketball

Methods

Q-Learning (Q Table)

Q-learning is slightly rudimentary but it had quite successful results. It uses a table of all possible states and uses the following equuation to explore and discover an optimal policy:

NFQ (Neural Fitted Q-Iteration)

NFQ uses a neural network to learn the Q values.

First a bunch of data is created using a random policy. Then a 2-layer neural net using PyTorch and a RPROP optimizer is created. Training was done target seen in the algorithm below.

Conclusions and Future

There were a lot of issues and there are still are. Firstly and most easily fixed is to change to a dynamic alpha and exploration value (epsilon) in the q-table variant. Much much more work can be done on the nfq side to create a better structured neural net as well tune the other various parameters.