Boosting my GA with Neural Networks and/or Reinforcement Learning

Posted by AlexT on Stack Overflow See other posts from Stack Overflow or by AlexT
Published on 2010-03-17T14:44:56Z Indexed on 2010/04/10 0:23 UTC
Read the original article Hit count: 784

As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.

That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (group of genes), such as

1 0 0 1 0 1 0 1 0 1 1 1 0 0...

How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?

In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2 dimensional array), although I'm a bit stuck on what the following algorithm wants from me:

(from http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm)

1.  Set parameter , and environment reward matrix R
   2. Initialize matrix Q as zero matrix
   3. For each episode:
          * Select random initial state
          * Do while not reach goal state
                o Select one among all possible actions for the current state
                o Using this possible action, consider to go to the next state
                o Get maximum Q value of this next state based on all possible actions
                o Compute
                o Set the next state as the current state

  End Do

  End For 

The big problem for me is implementing a reward matrix R and what a Q matrix exactly is, and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?

If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.

© Stack Overflow or respective owner

Related posts about theory

Related posts about computer-science