Pledging to do 100 days of Machine Learning and progress log!

< change language

It is a shame but I kind of dropped this when I was $41\%$ done...

I hope I man up and finish this in the near future.

After watching this video from Siraj Raval, I decided to jump right on board of the #100daysofMLcode initiative! (even though I am something like 73 days late...) The goal here is to devote (at least) 1h every day, for the next 100 days, to studying ML or writing code!

According to the rules posted by Siraj, I must:

Make a public pledge for this, which this post is;
Make a log of everything, which this post will also be;
Whenever I see something related to this #100DaysofMLCode, be supportive!

Progress log

For the day $0$ I wrote this post and spent quite some time thinking about what I will do throughout. I am thinking of studying several topics about ML and then writing educative posts here, for the blog.

For today I wrote this twitter proof, tackling a mathematical property of neural networks with linear activation functions.
Started reading about Reinforcement Learning and Markov Decision Processes; already imagined a nice example I will be using when writing about this... it will involve cake, lemonade and eventually a stomach ache.
I have been reading more about MDPs, and I found this answer on Cross Validated specially enlightening.
Reading more about the functioning of MDPs in this blog (which seems to have taken inspiration from these slides for this particular blog post).
Kept reading the resources from above; started sketching an example MDP to be used in my next blog post.
Finished sketching the example MDP and did some calculations related to it; started writing the first blog post about this and created a representation of the MDP for the post.
Finished writing and published this post on the basics of Markov Decision Processes.
Started writing the second post on MDPs, where we will explore how the changes in $\gamma $ and the transition probabilities affect the policies.
I wrote another post on MDPs, this time about the discount factor $\gamma$.
Following up on the last post, I wrote here about how the optimal policy can change when the discount factor changes.
Decided to use the Hanoi Towers as a model problem and started encoding it as an MDP; wrote some helper code for that and you can find such code here.
Kept writing the code from the previous day and applied the algorithm of value iteration to the Hanoi Towers, solving them. The code is in this GitHub repo.
Wrote this post on how to encode the problem of the Tower of Hanoi into an MDP.
Wrote the code for the policy iteration algorithm to solve the Tower of Hanoi; the code is in my GitHub.
Today I tried implementing Q-learning to solve the Tower of Hanoi. It isn't learning properly yet, so I decided not to upload the code to GitHub.
For today I have been all over the internet to try and find the reason why my Q-learning algorithm isn't working... Still haven't succeeded.
Because Q-learning wasn't working I tried to implement double Q-learning, just to be sure my problem wasn't one of bias. At first double Q-learning wasn't working as well, but now both algorithms are working just fine! They have been uploaded to GitHub now.
Started writing a Python notebook where I will explain the algorithms and show how to use them to solve the Tower of Hanoi. Those explanations will also be put here in the blog as they are written. The notebook can be found in my GitHub as well.
Kept on working on the notebook! Practically finished the section on value iteration.
I had to stop a bit; sorry for that! I am back and I have been reading about ML on financial markets and have been taking an online course about data science. It isn't exactly the same as ML, but knowing how to handle data ought to make me better at ML, right?
Kept taking the course Data Science: Visualization on edX.
Today I read about using machine learning for text translation.
Advanced a lot in one of the modules of the Data Science course...
And today I finished it!
I started the module on probability...
And today I kept going...
And finished it today!
Today I resumed my work on the Hanoi RL notebook.
Wrote about Q-learning and double Q-learning on my notebook.
Started the new module of the Data Science course.
Kept learning!
Today I kept learning for the course I am taking.
...
...
...
...
...
...
...
...
For the past days I have been studying for my online course... I think I have done around 2/3 of the course by now.

Are you into #100DaysofMLCode as well? Let me know in the comments what you have been doing!

Perdi-me quando estava a $41\%$ desta tarefa, o que é um pouco vergonhoso...

Espero decidir voltar a isto em breve.

Depois de ter visto este vídeo do Siraj Raval, decidi fazer parte do movimento #100DaysofMLCode! (apesar de já vir uns 73 dias atrasado...) O objetivo desta iniciativa é dedicar (pelo menos) 1h por dia, durante os próximos 100 dias, à aprendizagem automática (ML): quer estudando, quer escrevendo código.

De acordo com as regras que o Siraj publicou, devo:

Comprometer-me publicamente a levar esta iniciativa até ao fim (e a fazer "figura de urso" se me perder a meio);
Registar publicamente os meus progressos diários;
Quando eu vir alguma coisa marcada com o hashtag #100DaysofMLCode, devo encorajar os autores.

Registo diário

Para o dia $0$ escrevi este post e gastei algum tempo a pensar sobre o que vou fazer. Tenciono ir estudando tópicos diversos e escrever posts educativos aqui no blog.

Hoje escrevi esta prova num tweet, sobre uma curiosidade matemática relacionada com redes neuronais e a função de ativação linear.
Comecei a ler sobre Reinforcement Learning e Markov Decision Processes. Também já imaginei um pequeno exemplo que vou usar para explicar estas noções; vai involver bolo, limonada e, quando chegar a altura, dores de barriga.
Continuei a ler sobre MDPs e esta resposta no Cross Validated foi particularmente esclarecedora... ou se calhar foi só porque finalmente tudo fez "clique"!
Ler mais sobre o modo como MDPs funcionam neste blog (que parece ter-se inspirado nestes slides para este post em particular).
Continuei a ler os recursos disponíveis aqui em cima; comecei a rascunhar um MDP que vou usar no meu próximo post.
Acabei o rabisco do MDP e fiz umas contas com ele; comecei a escrever o primeiro post do blog e criei uma representação esquemática do MDP para vos apresentar.
Acabei de escrever e publiquei este post sobre Markov Decision Processes.
Comecei a escrever o próximo post sobre MDPs, um post onde se vão explorar os efeitos de alterar o factor $\gamma $ e as probabilidades de transição.
Escrevi outro post sobre MDPs, desta vez sobre o factor de desconto $\gamma$.
No seguimento do último post, escrevi aqui sobre como a melhor estratégia pode variar quando o factor $\gamma$ varia.
Decidi que o problema das Torres de Hanoi vai ser o meu problema exemplo e comecei a codificá-lo num MDP; escrevi algum código para me ajudar nessa tarefa e o dito código pode ser encontrado aqui.
Acabei o código do dia anterior e usei o algoritmo de "value iteration" no problema das Torres de Hanoi, resolvendo-o. O código está todo neste repositório do GitHub.
Escrevi este post sobre como transformar o problema da Torre de Hanoi num MDP.
Escrevi o código para resolver a Torre de Hanoi com o algoritmo da policy iteration; o código está no meu GitHub.
Hoje implementei o algoritmo de Q-learning para resolver a Torre de Hanoi. O código ainda não está a funcionar e por isso ainda não fiz o upload do ficheiro para o GitHub.
Andei por todo o lado na internet a tentar descobrir porque é que o meu algoritmo de Q-learning não está a funcionar... Por enquanto, sem sucesso.
O algoritmo de Q-learning não estava a funcionar e eu achei que pudesse ser um problema de bias, então tentei implementar double Q-learning. Primeiro, nenhum dos dois estava a funcionar, depois consegui pôr os dois a funcionar corretamente! Já fiz o upload do código para o meu GitHub now.
Comecei a escrever um notebook de Python onde vou explicar os algoritmos que implementei e onde mostro como usá-los para resolver a Torre de Hanoi. Essas explicações também vão sendo postas aqui no blog à medida que forem concluídas. O notebook também pode ser encontrado no meu GitHub.
Continuei a trabalhar no notebook e a secção sobre o algoritmo de value iteration está praticamente concluída.
Infelizmente parei um par de dias, mas estou de volta! Tenho lido um pouco sobre ML nos mercados financeiros e tenho continuado um curso sobre ciência de dados que comecei no Verão. Eu sei que ML e ciência de dados não são a mesma coisa, mas saber mexer em dados há de me ser favorável, não?
Continuei o curso sobre visualização de dados no edX.
Hoje li sobre como usar ML para traduzir textos.
Fiz grandes progressos num dos módulos do curso sobre ciência de dados...
E hoje acabei-o!
Hoje comecei o módulo sobre probabilidade.
E hoje continuei-o...
E acabei-o hoje!
Hoje dei continuidade ao trabalho que já tinha feito no notebook sobre RL aplicado à Torre de Hanoi.
Escrevi sobre Q-learning e double Q-learning no meu notebook.
Comecei o módulo seguinte do curso de ciência de dados.
Continuei a aprender.
Hoje dei continuidade ao curso que estou a fazer.
...
...
...
...
...
...
...
...
Nos últimos dias tenho estudado para o curso online. Acho que por esta altura já acabei 2/3 do curso.

Há aqui algum leitor envolvido com o #100DaysofMLCode? Digam nos comentários o que andam a fazer!

- RGS

Search This Blog

Mathspp: mathematics and programming