[mlpack] GSOC 2017: Reinforcement Learning

Marcus Edel marcus.edel at fu-berlin.de
Mon Mar 6 10:14:21 EST 2017


Hello Sahith,

> Since most of the issue right now on mlpack are being taken up very quickly,
> could you guide me on how to work on a simple reinforcement learning project
> like you had suggested earlier? You had proposed implementing a simple agent and
> using Policy Gradients but I'm unsure on what exactly I need to be testing it
> on.

So if you just need an easy environment you like to test your algorithm against,
I would recommend to take a look at the tasks Bang Liu has written last year:
https://github.com/BangLiu/mlpack/blob/ne/src/mlpack/methods/ne/tasks.hpp He
implemented the Xor, CartPole, DoublePole and MountainCar task. Each task can be
easily used for a unit test.

Another option is to use OpenAI's Gym toolkit that provides a variety of
different tasks, the main Gym interface is meant to be used from within python,
but no worries we have written some code that allows you to use it from c++:
https://github.com/zoq/gym_tcp_api

Is that helpful? If you have any questions, feel free to ask.

Thanks,
Marcus

> On 5 Mar 2017, at 11:48, Sahith D <sahithdn at gmail.com> wrote:
> 
> Hi Marcus,
>                  Since most of the issue right now on mlpack are being taken up very quickly, could you guide me on how to work on a simple reinforcement learning project like you had suggested earlier? You had proposed implementing a simple agent and using Policy Gradients but I'm unsure on what exactly I need to be testing it on.
> 
> Thank You,
> Sahith
> 
> On Sat, Mar 4, 2017 at 12:53 AM Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
> Hi,
>  I've been working on multiple different algorithms to use with the agent and then training it to see how they stack up against each other. So you could say the last 3 months have been both building and to a smaller extent training the agent as well. I've been using different algorithms because I'm still learning a lot about reinforcement learning as I go on so I don't really have a preference right now. I've done the most work with Q-learning and Monte-Carlo decision tree algorithms though.
> 
> Thank You,
> Sahith 
> 
> On Fri, Mar 3, 2017 at 8:36 PM Marcus Edel <marcus.edel at fu-berlin.de <mailto:marcus.edel at fu-berlin.de>> wrote:
> Hello Sahith,
> 
> When you say training for 3 months, I guess you mean you work on your agent for 3 months, and you don't train it for 3 months. I guess, if you really train it for 3 months, that's passion. Also since you worked on a bunch of algorithms, do you have a preference?
> 
> Best,
> Marcus
> 
> 
>> On 2 Mar 2017, at 18:59, Sahith D <sahithdn at gmail.com <mailto:sahithdn at gmail.com>> wrote:
>> 
>> 
>> 
>> 
>> 
>> That sounds really interesting, do you use your own model to solve the tasks?
>>  
>> I am using tensorflow as part of the backend but other than that I've built the model from scratch
>> 
>> A basic understanding of OpenAI's gym toolkit is definitely helpful, note that
>> we have written a wrapper so that the toolkit can be used from within C++. Also,
>> to be successful at this project, you should have a good knowledge of
>> reinforcement learning; i.e., you should be familiar with the way agents are
>> typically built and trained, and certainly, you should be familiar with the
>> individual components that you plan to implement.
>>  
>> The project I'm currently doing has multiple approaches of reinforcement learning that I've tried like Temporal Difference learning, Deep Q Networks and Monte Carlo Decision Trees. I've also been building and training my agent for more than 3 months now. I built it from scratch so I have a decent understanding about the individual components
>>  
>> There are some easy issues on GitHub that you might find interesting, we will
>> see if we can add more in the next days. Besides that, since you like to work on
>> the reinforcement learning project, maybe you like to implement an simple agent,
>> that is capable of solving some simple tasks; Policy Gradients is a simple
>> method that is really powerful and also quite intuitive. Don't feel obligated,
>> you don't have to solve issues or implement anything to be considered for the
>> project, but it's an easy way to dive into the codebase.
>>  
>> I'll go though the code base and try solving some issues or doing what you said about implementing a simple agent.
>> Thanks for the help!
>> 
>> Yours Sincerely,
>> Sahith N D
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20170306/b2eda224/attachment-0001.html>


More information about the mlpack mailing list